shellreef What are associative arrays? Associative arrays are arrays indexed by string instead of a number. Normally, an array element is accessed by adding the index (offset) to the beginning and using the value located there. Assocative arrays are a bit more complicated -- the hash (sum) of the key is used as the offset. Associative arrays vs. normal arrays indexed by constants Most of the time a constant can be used to make a normal array accept names rather than numbers as the index. For example: CONST foo = 0 DIM array(10) array(foo) = 42 PRINT array(foo) However this has some problems. Keys are defined at compile-time; you cannot define your own keys while the program is running. Associative arrays, on the other hand, allow arrays to be looked up by actual strings. The most simple associative array implementation An easy way to make an assocative array is this by using a TYPE: TYPE element tag AS STRING * 10 value AS STRING * 10 END TYPE DIM SHARED aa(10) AS element DIM SHARED aalast ' Last occupied AA() element FUNCTION getvalue$(tag AS STRING) tag = LTRIM$(RTRIM$(tag)) tag = tag + STRING$(10 - LEN(tag), " ") FOR i = 0 TO aalast IF (tag = aa(i).tag) THEN getvalue$ = aa(i).value EXIT FUNCTION ENDIF NEXT END FUNCTION SUB setvalue (tag AS STRING, value AS STRING) aa(aalast).tag = tag aa(aalast).value = value aalast = aalast + 1 END SUB setvalue "foo", "bar" PRINT getvalue$("foo") ' prints bar This approach has a huge flaw: every time a key is looked up, program execution runs a loop, which is slow. Especially with a large number of items. A TYPE-based associative array just doesn't scale well. Associative Arrays by Hashes The fastest way to implement associative arrays is hashes. A hash is a fixed size integer value determined by feeding a function variable- length-sized data. The hash of a string should be different for most strings, but has to be the same for the same strings. Thus, hashes cannot be randomly generated. Basically, we will use a hash as the index to a real array. What is a checksum/hash? The main purpose of hashes or checksums is to check if data is same on both ends of a communication. A modem might compute a checksum for a file and send the file along with it. The receiving computer would compute the checksum on the data received and compare it with the sent checksum. If the two checksums differ, either the data became corrupted or the checksum got mangled. In any case, the file is discarded and resent. Checksums work well, but there is a small chance of the file being corrupted yet producing the same checksum. A good checksum/hashing algorithm changes even when one bit is inverted, and should avoid this case as much as possible. Simply put, hashes are a number unique to a piece of data. We'll use a hash as an index to a QBasic array. Chosing a hashing algorithm for QBasic arrays CRC32 is a well known checksum algorithm, a Cyclic Redundancy Check composed of 32 bits. 32 bits can contain a maximum value of &HFFFFFF = 4294967295. This is way to large for QBasic's limit of 60 elements per array. Of course, we could use several arrays, and have a multiplex function, but I chose to use a hashing algorithm with a maximum output value of 60 for any input data. This was my initial algorithm: ' Return a value 0-60, unique (as much as possible) to S$ FUNCTION mkhash(s$) DIM ret AS INTEGER FOR i = 1 TO LEN(s$) ret = ret + (ASC(MID$(s$, i, 1)) - 128) NEXT ret = ABS(ret) mkhash = ret END FUNCTION There are some problems with this algorithm. First of all, the return value is not always within the 0..60 range. "foo" is exactly 60. But one of the more serious problems is "foo" has the exact same value as "oof", as does "ofo". Position is not taken into account. This can easily be solved by adding the position in there somewhere. After a few minutes of experimenting, I came up with this: FUNCTION mkhash(s$) DIM ret AS INTEGER FOR i = 1 TO LEN(s$) c$ = MID$(s$, i, 1) c = ASC(c$) ' If position in string is odd, negate current return value ' This keeps the hash value small (can't be over 60) IF (i MOD 1) THEN rr = ret ELSE rr = -ret ENDIF ' Found using trial and error. Not perfect, but works ret = rr + (c * (i / 2)) + i NEXT ' Make hash within 0..60. Does not always work, esp. with huge keys ret = INT(ret / 16) ' Use absolute value of return value. Just in case there is a ' negative. ret = ABS(ret) ' This algorithm cannot handle very large keys IF (ret > 60) THEN PRINT "mkhash: error: key too large: " + s$ ret = 0 ENDIF mkhash = ret END FUNCTION Seems to work well. You may encounter problems if a string contains a large number of high non-ASCII characters, such as CHR$(255). 14 chars is a good limit. Accessor Functions The internal storage for the hash: DIM SHARED hasharray$(60) hasharray$() will not be accessed directly, but via an accessor function named hash$: FUNCTION hash$(s$) hash$ = hasharray$(mkhash(s$)) END FUNCTION Unfortantly, QBasic does not allow opererator overloading, so we'll have to use a sethash SUB for writing to the hash: SUB sethash(key$, value$) hasharray$(mkhash(key$)) = value$ END SUB Putting it all together + an example program ' Warning: Don't expect this to work all the time, keys may overlap with ' others. Test thoughly. ' Instead of accessing directly, use HASH$ and SETHASH accessor methods DIM SHARED hasharray$(60) FUNCTION mkhash(s$) DIM ret AS INTEGER FOR i = 1 TO LEN(s$) c$ = MID$(s$, i, 1) c = ASC(c$) ' If position in string is odd, negate current return value ' This keeps the hash value small (can't be over 60) IF (i MOD 1) THEN rr = ret ELSE rr = -ret ENDIF ' Found using trial and overflow. Not perfect, but works ret = rr + (c * (i / 2)) + i NEXT ' Make hash within 0..60. Does not always work, esp. with >14 huge keys ret = INT(ret / 16) ' Use absolute value of return value. Just in case there is a ' negative. ret = ABS(ret) ' mkhash() only works for small keys IF (ret > 60) THEN PRINT "mkhash: error: key too large: " + s$ ret = 0 ENDIF mkhash = ret END FUNCTION FUNCTION hash$(s$) hash$ = hasharray$(mkhash(s$)) END FUNCTION SUB sethash(key$, value$) hasharray$(mkhash(key$)) = value$ END SUB ' Example program sethash "foo", "bar" ' Set key foo to bar PRINT hash$("foo") ' Should print bar Download source Conclusion I hope this tutorial was useful. If you have any questions, contact me at hash-tutorial@cvsmbf.cjb.net If you're interested in a great tutorial about checksums and CRC32 (using C), visit Ross William's CRC Pitstop, http://www.ross.net/crc/. Perl's hashing algorithm is explained here. -- shellreef
Modified Sun Mar 25 08:48:47 2007
generated Sun Mar 25 08:56:33 2007
http://jeff.tk/qbhash/