shellreef
What are associative arrays?
Associative arrays are arrays indexed by string instead of a number.
Normally, an array element is accessed by adding the index (offset) to
the beginning and using the value located there. Assocative arrays are
a bit more complicated -- the hash (sum) of the key is used as the offset.
Associative arrays vs. normal arrays indexed by constants
Most of the time a constant can be used to make a normal array accept
names rather than numbers as the index. For example:
CONST foo = 0
DIM array(10)
array(foo) = 42
PRINT array(foo)
However this has some problems. Keys are defined at compile-time; you
cannot define your own keys while the program is running. Associative
arrays, on the other hand, allow arrays to be looked up by actual strings.
The most simple associative array implementation
An easy way to make an assocative array is this by using a TYPE:
TYPE element
tag AS STRING * 10
value AS STRING * 10
END TYPE
DIM SHARED aa(10) AS element
DIM SHARED aalast ' Last occupied AA() element
FUNCTION getvalue$(tag AS STRING)
tag = LTRIM$(RTRIM$(tag))
tag = tag + STRING$(10 - LEN(tag), " ")
FOR i = 0 TO aalast
IF (tag = aa(i).tag) THEN
getvalue$ = aa(i).value
EXIT FUNCTION
ENDIF
NEXT
END FUNCTION
SUB setvalue (tag AS STRING, value AS STRING)
aa(aalast).tag = tag
aa(aalast).value = value
aalast = aalast + 1
END SUB
setvalue "foo", "bar"
PRINT getvalue$("foo") ' prints bar
This approach has a huge flaw: every time a key is looked up, program
execution runs a loop, which is slow. Especially with a large number of
items. A TYPE-based associative array just doesn't scale well.
Associative Arrays by Hashes
The fastest way to implement associative arrays is hashes. A hash is
a fixed size integer value determined by feeding a function variable-
length-sized data. The hash of a string should be different for most
strings, but has to be the same for the same strings. Thus, hashes
cannot be randomly generated.
Basically, we will use a hash as the index to a real array.
What is a checksum/hash?
The main purpose of hashes or checksums is to check if data is same on
both ends of a communication. A modem might compute a checksum for
a file and send the file along with it. The receiving computer would
compute the checksum on the data received and compare it with the sent
checksum. If the two checksums differ, either the data became corrupted
or the checksum got mangled. In any case, the file is discarded and
resent.
Checksums work well, but there is a small chance of the file being
corrupted yet producing the same checksum. A good checksum/hashing
algorithm changes even when one bit is inverted, and should avoid this
case as much as possible.
Simply put, hashes are a number unique to a piece of data. We'll use
a hash as an index to a QBasic array.
Chosing a hashing algorithm for QBasic arrays
CRC32 is a well known checksum algorithm, a Cyclic Redundancy Check
composed of 32 bits. 32 bits can contain a maximum value of
&HFFFFFF = 4294967295. This is way to large for QBasic's limit of 60
elements per array. Of course, we could use several arrays, and have
a multiplex function, but I chose to use a hashing algorithm with a
maximum output value of 60 for any input data.
This was my initial algorithm:
' Return a value 0-60, unique (as much as possible) to S$
FUNCTION mkhash(s$)
DIM ret AS INTEGER
FOR i = 1 TO LEN(s$)
ret = ret + (ASC(MID$(s$, i, 1)) - 128)
NEXT
ret = ABS(ret)
mkhash = ret
END FUNCTION
There are some problems with this algorithm. First of all, the return
value is not always within the 0..60 range. "foo" is exactly 60. But
one of the more serious problems is "foo" has the exact same value as
"oof", as does "ofo". Position is not taken into account.
This can easily be solved by adding the position in there somewhere.
After a few minutes of experimenting, I came up with this:
FUNCTION mkhash(s$)
DIM ret AS INTEGER
FOR i = 1 TO LEN(s$)
c$ = MID$(s$, i, 1)
c = ASC(c$)
' If position in string is odd, negate current return value
' This keeps the hash value small (can't be over 60)
IF (i MOD 1) THEN
rr = ret
ELSE
rr = -ret
ENDIF
' Found using trial and error. Not perfect, but works
ret = rr + (c * (i / 2)) + i
NEXT
' Make hash within 0..60. Does not always work, esp. with huge keys
ret = INT(ret / 16)
' Use absolute value of return value. Just in case there is a
' negative.
ret = ABS(ret)
' This algorithm cannot handle very large keys
IF (ret > 60) THEN
PRINT "mkhash: error: key too large: " + s$
ret = 0
ENDIF
mkhash = ret
END FUNCTION
Seems to work well. You may encounter problems if a string contains a
large number of high non-ASCII characters, such as CHR$(255). 14 chars
is a good limit.
Accessor Functions
The internal storage for the hash:
DIM SHARED hasharray$(60)
hasharray$() will not be accessed directly, but via an accessor function
named hash$:
FUNCTION hash$(s$)
hash$ = hasharray$(mkhash(s$))
END FUNCTION
Unfortantly, QBasic does not allow opererator overloading, so we'll have
to use a sethash SUB for writing to the hash:
SUB sethash(key$, value$)
hasharray$(mkhash(key$)) = value$
END SUB
Putting it all together + an example program
' Warning: Don't expect this to work all the time, keys may overlap with
' others. Test thoughly.
' Instead of accessing directly, use HASH$ and SETHASH accessor methods
DIM SHARED hasharray$(60)
FUNCTION mkhash(s$)
DIM ret AS INTEGER
FOR i = 1 TO LEN(s$)
c$ = MID$(s$, i, 1)
c = ASC(c$)
' If position in string is odd, negate current return value
' This keeps the hash value small (can't be over 60)
IF (i MOD 1) THEN
rr = ret
ELSE
rr = -ret
ENDIF
' Found using trial and overflow. Not perfect, but works
ret = rr + (c * (i / 2)) + i
NEXT
' Make hash within 0..60. Does not always work, esp. with >14 huge keys
ret = INT(ret / 16)
' Use absolute value of return value. Just in case there is a
' negative.
ret = ABS(ret)
' mkhash() only works for small keys
IF (ret > 60) THEN
PRINT "mkhash: error: key too large: " + s$
ret = 0
ENDIF
mkhash = ret
END FUNCTION
FUNCTION hash$(s$)
hash$ = hasharray$(mkhash(s$))
END FUNCTION
SUB sethash(key$, value$)
hasharray$(mkhash(key$)) = value$
END SUB
' Example program
sethash "foo", "bar" ' Set key foo to bar
PRINT hash$("foo") ' Should print bar
Download source
Conclusion
I hope this tutorial was useful. If you have any questions, contact me at
hash-tutorial@cvsmbf.cjb.net
If you're interested in a great tutorial about checksums and CRC32
(using C), visit Ross William's CRC Pitstop, http://www.ross.net/crc/.
Perl's hashing algorithm is explained here.
-- shellreef
Modified Sun Mar 25 08:48:47 2007
generated Sun Mar 25 08:56:33 2007
http://jeff.tk/qbhash/