Admittedly, our Chuck Norris is actually called Bob (or Robert) Morris and counts probabilistically, but the Morris Counter he invented is nevertheless a nice introduction to the exciting ways of thinking in the world of big data. If you're now thinking, " Probabilistic counting? What is he talking about? ", dive into the next section with me, where I will reveal the secret of the Morris Counter in order to use it as a basis for explaining other approaches, such as the aforementioned HyperLogLog or BloomFilter.
The so-called Morris counter was designed in the early years of computer science to save storage space when using counters on what was then still very modest hardware [1] . Today's computers also work with the binary system and store numbers in so-called bits, which albania telegram screening are set to either 0 or 1 (with the exception of future quantum computers). Each of these bits can therefore represent two different values, two bits together corresponding to two times two, i.e. four values, three bits two times two times two values, and so on.
Or to look at it the other way around, to be able to count up to 1,000, for example, we need at least 10 bits, since 2 10 = 1,024 . To be able to determine how many bits are needed to store a number in the decimal system, the logarithm of two can be used: the result of log2(1,000) = 9.9658 must logically be rounded up to 10, since fractions of bits make no sense for storing data. Robert Morris faced exactly this problem in the 1970s. He only had an 8-bit register available for counting events, but he had to count much further than 255.
To get by with fewer bits when counting, there is a simple idea: we only count every second occurrence of an event and multiply the counter by two at the end to get the correct value again. Accordingly, we can get by with one bit less memory. Or we can only count every fourth occurrence and thus save two bits, etc.