Using Associative Arrays in QBasic

IIRC, this was originally written for Gopus' online QBasic magazine, the QB Inquirer, though it doesn't seem to be in any of the published issues.
shellreef

What are associative arrays?

    Associative arrays are arrays indexed by string instead of a number.
    Normally, an array element is accessed by adding the index (offset) to
    the beginning and using the value located there.  Assocative arrays are
    a bit more complicated -- the hash (sum) of the key is used as the offset.

Associative arrays vs. normal arrays indexed by constants

    Most of the time a constant can be used to make a normal array accept
    names rather than numbers as the index.  For example:

        CONST foo = 0
        DIM array(10)

        array(foo) = 42
        PRINT array(foo)

    However this has some problems.  Keys are defined at compile-time; you
    cannot define your own keys while the program is running.  Associative
    arrays, on the other hand, allow arrays to be looked up by actual strings.

The most simple associative array implementation

    An easy way to make an assocative array is this by using a TYPE:

        TYPE element
            tag AS STRING * 10
            value AS STRING * 10
        END TYPE
        DIM SHARED aa(10) AS element
        DIM SHARED aalast       ' Last occupied AA() element

        FUNCTION getvalue$(tag AS STRING)
        tag = LTRIM$(RTRIM$(tag))
        tag = tag + STRING$(10 - LEN(tag), " ")
        FOR i = 0 TO aalast
            IF (tag = aa(i).tag) THEN
                getvalue$ = aa(i).value
                EXIT FUNCTION
            ENDIF
        NEXT
        END FUNCTION

        SUB setvalue (tag AS STRING, value AS STRING)
        aa(aalast).tag = tag
        aa(aalast).value = value
        aalast = aalast + 1
        END SUB

        setvalue "foo", "bar"
        PRINT getvalue$("foo")    ' prints bar

    This approach has a huge flaw: every time a key is looked up, program
    execution runs a loop, which is slow.  Especially with a large number of
    items.  A TYPE-based associative array just doesn't scale well.

Associative Arrays by Hashes

    The fastest way to implement associative arrays is hashes.  A hash is
    a fixed size integer value determined by feeding a function variable-
    length-sized data.  The hash of a string should be different for most
    strings, but has to be the same for the same strings.  Thus, hashes
    cannot be randomly generated.

    Basically, we will use a hash as the index to a real array.

What is a checksum/hash?

    The main purpose of hashes or checksums is to check if data is same on
    both ends of a communication.  A modem might compute a checksum for
    a file and send the file along with it.  The receiving computer would
    compute the checksum on the data received and compare it with the sent
    checksum.  If the two checksums differ, either the data became corrupted
    or the checksum got mangled.  In any case, the file is discarded and
    resent.

    Checksums work well, but there is a small chance of the file being
    corrupted yet producing the same checksum.  A good checksum/hashing
    algorithm changes even when one bit is inverted, and should avoid this
    case as much as possible.

    Simply put, hashes are a number unique to a piece of data.  We'll use
    a hash as an index to a QBasic array.

Chosing a hashing algorithm for QBasic arrays

    CRC32 is a well known checksum algorithm, a Cyclic Redundancy Check
    composed of 32 bits.  32 bits can contain a maximum value of
    &HFFFFFF = 4294967295.  This is way to large for QBasic's limit of 60
    elements per array.  Of course, we could use several arrays, and have
    a multiplex function, but I chose to use a hashing algorithm with a
    maximum output value of 60 for any input data.

    This was my initial algorithm:

        ' Return a value 0-60, unique (as much as possible) to S$
        FUNCTION mkhash(s$)
        DIM ret AS INTEGER
        FOR i = 1 TO LEN(s$)
            ret = ret + (ASC(MID$(s$, i, 1)) - 128)
        NEXT
        ret = ABS(ret)
        mkhash = ret
        END FUNCTION

    There are some problems with this algorithm.  First of all, the return
    value is not always within the 0..60 range.  "foo" is exactly 60.  But
    one of the more serious problems is "foo" has the exact same value as
    "oof", as does "ofo".  Position is not taken into account.

    This can easily be solved by adding the position in there somewhere.
    After a few minutes of experimenting, I came up with this:

        FUNCTION mkhash(s$)
        DIM ret AS INTEGER
        FOR i = 1 TO LEN(s$)
            c$ = MID$(s$, i, 1)
            c = ASC(c$)
            ' If position in string is odd, negate current return value
            ' This keeps the hash value small (can't be over 60)
            IF (i MOD 1) THEN
                rr = ret
            ELSE
                rr = -ret
            ENDIF
            ' Found using trial and error.  Not perfect, but works
            ret = rr + (c * (i / 2)) + i
        NEXT
        ' Make hash within 0..60.  Does not always work, esp. with huge keys
        ret = INT(ret / 16)
    
        ' Use absolute value of return value.  Just in case there is a
        ' negative.
        ret = ABS(ret)

        ' This algorithm cannot handle very large keys
        IF (ret > 60) THEN
            PRINT "mkhash: error: key too large: " + s$
            ret = 0
        ENDIF
        mkhash = ret
        END FUNCTION

    Seems to work well.  You may encounter problems if a string contains a
    large number of high non-ASCII characters, such as CHR$(255).  14 chars
    is a good limit.

Accessor Functions

    The internal storage for the hash:

        DIM SHARED hasharray$(60)

    hasharray$() will not be accessed directly, but via an accessor function
    named hash$:

        FUNCTION hash$(s$)
        hash$ = hasharray$(mkhash(s$))
        END FUNCTION

    Unfortantly, QBasic does not allow opererator overloading, so we'll have
    to use a sethash SUB for writing to the hash:

        SUB sethash(key$, value$)
        hasharray$(mkhash(key$)) = value$
        END SUB

Putting it all together + an example program

' Warning: Don't expect this to work all the time, keys may overlap with
'          others.  Test thoughly.

' Instead of accessing directly, use HASH$ and SETHASH accessor methods
DIM SHARED hasharray$(60)

FUNCTION mkhash(s$)
DIM ret AS INTEGER
FOR i = 1 TO LEN(s$)
    c$ = MID$(s$, i, 1)
    c = ASC(c$)
    ' If position in string is odd, negate current return value
    ' This keeps the hash value small (can't be over 60)
    IF (i MOD 1) THEN
        rr = ret
    ELSE
        rr = -ret
    ENDIF
    ' Found using trial and overflow.  Not perfect, but works
    ret = rr + (c * (i / 2)) + i
NEXT
' Make hash within 0..60.  Does not always work, esp. with >14 huge keys
ret = INT(ret / 16)
    
' Use absolute value of return value.  Just in case there is a
' negative.
ret = ABS(ret)

' mkhash() only works for small keys
IF (ret > 60) THEN
    PRINT "mkhash: error: key too large: " + s$
    ret = 0
ENDIF
mkhash = ret
END FUNCTION
    
FUNCTION hash$(s$)
hash$ = hasharray$(mkhash(s$))
END FUNCTION

SUB sethash(key$, value$)
hasharray$(mkhash(key$)) = value$
END SUB

' Example program
sethash "foo", "bar"        ' Set key foo to bar
PRINT hash$("foo")          ' Should print bar

Download source

Conclusion

    I hope this tutorial was useful.  If you have any questions, contact me at

        hash-tutorial@cvsmbf.cjb.net

    If you're interested in a great tutorial about checksums and CRC32
    (using C), visit Ross William's CRC Pitstop, http://www.ross.net/crc/.
    Perl's hashing algorithm is explained here.

    -- shellreef

Valid HTML 4.0?

Modified Sun Mar 25 08:48:47 2007 generated Sun Mar 25 08:56:33 2007
http://jeff.tk/qbhash/