Jan. 28th, 2017

dennisgorelik: (Default)
Our web crawler hanged. Again.
It looks like it hangs with the frequency of about 1 hang per 1 million web page downloads.
Does not timeout. Does not crash. Just hangs.
Fortunately, other threads in our PostJobFreeService keep running.
(Until AngleSharp HTML parser would crash the whole PostJobFreeService on some weird HTML page, of course.)

Unfortunately, crawler hang is not reproducible: the same page can be downloaded without any problems on the next attempt. Or just in a browser.
But once in 1M downloads something weird happens: our crawler successfully passes HTTP handshake with the remote web server (so no HTTP connection timeout), but then hangs.

For our crawler we are using standard HttpWebRequest class from .NET framework.
Should we crawl with something else?

Or is it inevitable that web crawler would hang eventually and our watchdog should simply restart corresponding thread?

Discussion in Livejournal: http://dennisgorelik.livejournal.com/124693.html
dennisgorelik: (Default)
A job seeker asked me if "Reliance Capital Limited" company that contacted her is a legitimate employer.
Judging by the company name and the way they communicated with her (text-only) - it probably is a scam.

But in order to find out for sure if recruiter is real deal:
1) Call them (scammers frequently try to avoid talking on the phone and hide behind text messages and emails).
2) Assume that there could be a scammer on another end and do not reveal your sensitive personal details.
3) Typical signs of scammers:
- Bad phone connection quality (because scammers frequently use internet proxy).
- _Heavy_ foreign accent from a poor country (typically Nigerian accent, but occasionally could be Russian or some other accent). The scammer would may insist that they are in the US or in London.
- Incoherent business story (ask them what they sell to their customers).


dennisgorelik: (Default)
Dennis Gorelik

September 2017

34567 8 9
1011 12131415 16

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 21st, 2017 08:46 am
Powered by Dreamwidth Studios