Home About
Complex Discrete Logs Compressed Histogram Hash
DIY crypto Fusion
Heterodyne Physics McCoskey Count Sort
Oil Spill Fix Personal Rail
Rational Residue Residue Arithmetic
Reverse Translation Analysis Rotary Engine
Subatomic Energy

Histogram String Hash

Here is the data for the compressed histogram hash program for strings. Source code for this program can be found here:

http://mcky.net/bin/hsh/hshs.tar.gz

http://mcky.net/bin/hsh/hshs.zip

Here is the annotated program output for 10 million strings:

strings 10000000 Create 10 million 4 to 24 character strings
characters 154993373 About 155 million characters
count sort time 5.7200 Count sort time in seconds
count sort ok Count sort output checked
unique strings 9999872 Removed duplicate strings
characters in unique strings 144991653 About 145 million characters
data size 184991141 Unique string characters plus size of indices in byts
hash time 4.1100 Time to create hash in seconds
hash bytes 147658209 Size of the hash in bytes
compression 0.798191 Hash size divided by data size
hash check ok More than 19 million accesses
hash check time 4.8800 Hash check time in seconds
quick sort time 11.1300 Quick sort time in seconds
quick sort ok Quick sort results checked

Here is the collated output of several runs of the string hash program

String hash results

strings 100000 200000 500000 1000000 2000000 5000000 10000000
characters 1550458 3099449 7750695 15507989 31008965 77510230 154993373
count sort time 0.0300 0.0600 0.1600 0.3600 0.8200 2.6600 5.9200
count sort ok ok ok ok ok ok ok
unique strings 99840 199936 499968 999936 1999872 4999936 9999872
characters in unique strings 1448086 2898463 7250167 14507026 29007083 72509340 144991653
data size 1847446 3698207 9250039 18506770 37006571 92509084 184991141
hash time 0.0200 0.0400 0.1400 0.3300 0.7100 1.9700 4.1700
hash bytes 1584205 3141180 7792923 15549750 31049743 73098644 147658209
compression 0.857511 0.849379 0.842475 0.840220 0.839033 0.790178 0.798191
hash check ok ok ok ok ok ok ok
hash check time 0.0500 0.0800 0.2900 0.6600 1.3100 2.3500 4.9000
quick sort time 0.0500 0.1200 0.3400 0.7700 1.7300 5.0000 11.2400
quick sort ok ok ok ok ok ok ok