dennisgorelik: 2020-06-13 in my home office (Default)
https://youtu.be/UGL_OL3OrCE?t=1173
"So we do calibration at the same time as imaging"

~~~~~~~~~~~~
https://en.wikipedia.org/wiki/Calibration
Calibration in measurement technology and metrology is the comparison of measurement values delivered by a device under test with those of a calibration standard of known accuracy.
~~~~~~~~~~~~

The only image of "known accuracy" that these scientists had during black hole imaging -- was a theoretical image of how that black hole should look like.
But it is invalid to use that theoretical image as a self-proof that this theoretical image is correct.

So this team of "black hole photographers":
1) Took an extremely sparse signals from their several telescopes.
2) "Calibrated" their "signal interpretation" algorithm based on the theoretical black hole image (that they wanted to see).
3) Made "calibrated" "signal interpretation" algorithm to interpret sparse signals.
4) Not surprisingly, their "signal interpretation" algorithm produced theoretical black hole image that these "photographers" wanted to see.

What these "black hole photographers" did is NOT science, but scientific scam.

That explains why these "photographers" instead of photographing Sagittarius A (that is 26 thousand light years away) chose to photograph Messier 87 (that is 53 million light years away -- 2000 times further!)

At shorter distances there is not enough room for creative "calibration" of sparse signals.

See also:
Extracting a black hole image from "sparse telescope matrix"
dennisgorelik: 2020-06-13 in my home office (Default)
Business context
For years I wanted to collect new jobs from all over internet in order to send appealing job alert emails to candidates that created a profile on postjobfree.com
So, finally, I decided to create a web crawler for that.
However, unlike Google, I do not want to crawl billions of pages (too expensive). Several million pages should be good enough for the first working prototype.
The question is - how to determine automatically what pages to crawl and what to ignore?
That's why our web crawler is combined with self-learning neural network.

Data structure
We represent every page as a record in PageNeuron table (PageNeuronId int, Url varchar(500), …, PageRank real, ...)
We represent links from page to page in LinkAxon table (..., FromPageNeuronId int, ToPageNeuronId int, …)

PageRank calculations
PageRank is inspired by classic Google PageRank, however we calculate it differently.
Instead of calculating probability of visitor click, our NeuralRewardDistribution process distributes PageRank from every PageNeuron record to every connected record (in both directions).
With every “reward distribution” iteration NeuralRewardDistribution process distributes about 10% of PageRank to other pages (that amount is split between all reward destination PageNeuron records proportionally to LinkAxon weights).
Then, in order to prevent self-excitation of the system, NeuralRewardDistribution applies "forgetting" by reducing PageRank of original page by 10%.

Setting goals
When NeuralPageEvaluator parses crawled pages - it tries to detect words and patterns we need.
Every time NeuralPageEvaluator finds something useful, it adds reward in form of extra PageRank for the responsible PageNeuron record. For example, we reward:
- 1 PageRank point for such words as "job", "jobs", "career", "hr".
- 10 PageRank points for such words as "hrms", "taleo", "jobvite", "icims".
- 1000 PageRank points when parser discovers link to a new XML job feed in the content of PageNeuron record.
- 20 PageRank points when parser discovers link to an XML job feed that we already discovered in the past.

What to crawl
NeuralPageProcessor processes already crawled pages (PageNeuron) by passing them to NeuralPageEvaluator.
NeuralPageEvaluator returns collection of outgoing links from that parsed page.
If extracted outgoing link is new, then NeuralPageProcessor creates new PageNeuron record for it. For initial PageRank it uses (10% of source PageNeuron record PageRank * link share or that new URL among all other URLs that source PageNeuron record points to).
NeuralCrawler crawls new PageNeuron records with highest PageRank.

Cleanup
NeuralRewardDistribution deletes PageNeuron records (and all corresponding LinkAxon records) of PageRank is too low.
Current "delete threshold" is at PageRank = 0.01 which deletes about half of ~3 million PageNeuron records we already created.

Profile

dennisgorelik: 2020-06-13 in my home office (Default)
Dennis Gorelik

June 2025

S M T W T F S
1234 567
891011 12 13 14
15161718192021
22232425262728
2930     

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 6th, 2025 09:44 am
Powered by Dreamwidth Studios