dennisgorelik | Entries tagged with programming

~~~~~
https://www.quora.com/What-are-some-signs-that-you-have-become-a-better-software-engineer/answer/Ben-Podgursky
When you start out, you’re confident “bah, any decent answer on StackOverflow has at least 1,000 upvotes”
.....
A few months in, you realize that mostly only the the “help how i code” questions have thousands of upvotes. There are tons of useful answers in the hundreds range
.....
Until you finally become a “senior” engineer. You try to drag one last sip from the StackOverflow Slurpee, and you get nothing but ice and disappointment
~~~~~

Crossposts: https://dennisgorelik.livejournal.com/178704.html

From 11 years of maintaining my own codebase I learned that reusing fields is a bad idea that leads to poor code maintainability.
If unrelated methods use the same fields, then graph of field references start looking like a maze that is very hard to understand.
If "UserId" field is called by 19 methods from at least 2 distinct logical groups, then it takes long time to find out if we still need to load UserId from database record in JobAlertRequest().
If count of UserId references was much lower, then such review would take much less time.

Reusing fields (and local variables) is, generally, bad for maintenance.
But reusing methods is, generally, good for maintenance: if we fix bugs in a reused method - all places that use functionality are getting fixed.

Crossposts: https://dennisgorelik.livejournal.com/171238.html

--
https://news.ycombinator.com/item?id=14990587
I think it's more likely that your "great programmers" simply understand the difference between the same functionality and accidentally similar functionality. The latter is where you have two use cases that are very similar, so you spend all this time deduplicating. Then one of the use cases changes... The correct response would be to duplicate the code again, because the two use cases are no longer similar. In reality, they should have never been combined in the first case. They weren't the same; they were only accidentally similar.
But instead what you usually see is minor tweaks to the common functions. Pass in a flag here, tweak the inputs there, add an if statement over yonder... And before you know it, it's all a terrible tangled mess that is full of branches and technical debt. The two use cases have the same functions, but don't even follow the same branches within the functions.
--

Crossposts: http://dennisgorelik.livejournal.com/141014.html

ElasticSearch team defends the bloat in ElasticSearch Percolator 5.4
--------
https://github.com/elastic/elasticsearch/issues/25308
If you're not interested in ranking you can easily turn it off, by wrapping the percolate query in a constant_score query.
.....
The percolator tries to tag the queries automatically based on the containing query terms. However it can't do this for all percolator queries, because the percolator doesn't know how to extract meaningful information during indexing for all queries. This is a work in progress and will get better over time. It already has shown a significant performance improvement for cases where the percolator was able to analyze the percolator query correctly at index time.
--------

1) Funny how in order to turn off unneeded feature, application developers have to create an extra wrapper around their query.

2) "work in progress" did not stop ElasticSearch team from breaking backward compatibility and forcing their users to rewrite their legacy code in favor of "work in progress" ElasticSearch 5.4.

3) "a significant performance improvement" is not quantified, and the cases where that improvement happened -
not described.

See also: ElasticSearch Percolator Bloat - part 1

Crossposts: http://dennisgorelik.livejournal.com/137286.html

Today I learned that Spark programming language has nothing to do with Apache Spark.

That "Spark Language vs Apache Spark" confusion is even worse than "Java vs Javascript" confusion.

Crossposts: http://dennisgorelik.livejournal.com/136300.html

By juan-gandhi:
---
1) Мутабельные ключи в "хашмапе".
2) Стек для регистрации данных для последующей проверки, что мусора не осталось. Т.к. указатель на стек глобальный, а бегают несколько ниток, то чистый абсурд.
3) Класс на 183 метода, 0 тестов.
4) Регулярно, случайным образом, рушащиеся тесты, и святая вера, что "за последние несколько лет у нас ничего не ломалось".
5) Вера в то, что у нас все очень "эффективно" - и регулярные жалобы юзеров, что наш код очень медленный, в отличие от скального конкурента (!)
6) "Оптимистический мерж" - "это не мой тест упал, я тут не при чем, нам нужно релизить.
---

1) Интересно, зачем кому-то понадобилось делать мутабельные ключи в хашмапе?
2) Мне лично не встечалось, хотя при работе в multithread environment каких только ляпов не сделаешь...
3), 4), 5), 6) - мне в том или ином виде встречалось.

Crossposts: http://dennisgorelik.livejournal.com/127667.html

From "Committing code often" discussion:
It is ok to make mistakes, especially in the first "rapid-fire" version of the code.
There is no shame in it.

Even more: if you are not making any mistakes while coding - that means you are way too careful and are working much slower than you can.
(That does not mean, of course, that you should intentionally do mistakes. Just take greater risks in order to improve speed of programming/coding until you start getting occasional mistakes).

Crossposts: http://dennisgorelik.livejournal.com/127342.html

Normally, in case of invalid input Uri() code throws UriFormatException. But with really weird input Uri(baseUri, Url) overload can produce NullReferenceException:

[TestMethod]
[ExpectedException(typeof(NullReferenceException))]
public void UriFailureTest()
{
    new Uri(
        new Uri("https://jobs.web.cern.ch/content/cern-jobs-insight/what-are-we-doing-while-you%E2%80%99re-waiting"),
        "https:/jobs.web.cern.ch/content/cern-jobs-insight/what-are-we-doing-while-you%E2%80%99re-waiting");
}

My favorite Scala evangelist is trying to reinvent the wheel in caching.

He enjoys playing with monads and fancy terms like side-effects, but forgets about software development basics -- clearly defining real-world problem that he is trying to solve:
---
Dennis: Do you mean you do not support a scenario when multiple users are requesting your web app ~simultaneously?
ivan_gandhi: I do not condone sharing code with effects between threads.
---

It is not clear whether his code is supposed to work in multi-thread environment (like serving incoming web requests from multiple users) or in single-thread (like outgoing requests from a single thread service).
It is not clear whether he wants to reuse cache between parallel threads or not.
The end result -- code without purpose. Such purposeless code is impossible to meaningfully evaluate.
On the other hand such approach gives a lot of room for inventing Cartesian product of input source and time segments.

My takeaway from it is that Scala inspires excessive FP games and therefore distracts developers from solving real-life problems.

When users open my web site I want to know what JavaScript errors users have (if any).
That's why I append this javascript to almost every page on my web site:

window.onerror = function(errmessage, errurl, errline) {
	var params = {
		list: [],
		add: function(name, value) {
			if (value != null) this.list.push(name + '=' + encodeURIComponent(value));
			return this;
		},
		toString: function() {
			if (this.list.length) return '?' + this.list.join('&');
			return '';
		}
	};
	new Image().src='/jeh' + params.add('errmessage', errmessage)
		.add('errurl', errurl)
		.add('errline', errline)
		.add('r', Math.floor((Math.random() * 10) + 1));
}

That script reports javascript errors from user browser back to our server.
Once per day our server aggregates these errors and emails to developers Javascript Errors report.
This is an example of what that report looks like:

We then review these errors on case-by-case basis and decide whether we want to fix that error or we want to suppress that error from report (because we can not fix it).

Still, there are challenges: sometimes it is hard to separate errors that we can fix from errors we can not fix.
For example, we can not fix the most frequent "Uncaught ReferenceError: google is not defined" error, because it is caused by occasional browsers that do not work well with Google Maps API.
But we do not want to suppress that error either, because sometimes, by mistake, we may introduce problem in our own javascript that would generate the same error messages on mass scale for our users.

See: discussion in ivan-gandhi blog.

Lots of applications need to load and convert document files of different formats into other formats or into text.
You would have think that there would be a good solution to it.
Unfortunately it's not the case.
Existing solutions are either for desktop only, or buggy or extremely expensive (~$10K/year).

I thought I found a solution - DevExpress Document Server library for $599.99

Unfortunately, after running for couple of weeks it crashed my service with StackOverflowException exception:
----
https://www.devexpress.com/Support/Center/Question/Details/T257097
To my regret, there is no simple workaround to avoid this exception with your document. Regarding the time frame for fixing this issue, it is difficult to provide any estimate in such cases.
----

So now I need to find a way to prevent my service from dying in case if some random document is fed into it.

Sigh.