3. Data Structures

3.1 Arrays

Sleep scalars store one-piece of data. An int, a string, or perhaps an object. Sometimes it helps to combine multiple pieces of data into one place. To this end Sleep has two data structures: the cuddly array and loveable hash.

An array is a container that stores data in numerical order. You can recall a piece of data from an array based on the data's position in the array. I sometimes refer to this position as the index. Array positions always begin at 0. Array scalars have an at sign (such as @foo) as the first character in their name. This type information in the variable name allows Sleep to create an empty array if one doesn't exist.

$x = 3; @foo[0] = "Raphael"; @foo[1] = 42.5; @foo[2] = "Donatello"; @foo[$x] = "Michelangelo";

This example populates @foo with some values. Referring to @foo will refer to the entire array itself. The square brackets next to the variable name are the index operator. This operator accepts any Sleep expression and tries to recall a value from the data structure to the left of it. In this example I use 0, 1, 2, and $x to indiciate positions of @foo. @foo has the following contents when this example is complete:

The string representation of this same array is:

@('Raphael', 42.5, 'Donatello', 'Michelangelo')

When referencing an array with the index operator, it is acceptable to use negative indices. You can use @foo[-1] to reference the last item of @foo. Most array functions normalize negative indices. Sleep subtracts a negative index from the array size to get the real index.

@array = @("a", "b", "c", "d", "e"); # insert "foo" into second from last element. add(@array, "foo", -2); println(@array); # print last element of @array println(@array[-1]);

@('a', 'b', 'c', 'd', 'foo', 'e') e

You can assign arrays to each other as well. As stated in chapter 2, assigning an array to another array just copies the reference. Both @array's will point to the same data. A change in one array will affect the other array.

@a = @("a", "b", "c"); @b = @a; @b[1] = "!!!"; # see what I mean. println("@a: " . @a); println("@b: " . @b);

@a: @('a', '!!!', 'c') @b: @('a', '!!!', 'c')

The function &size takes an array as a parameter and returns the total number of items in the array.

@a = @("a", "b", "c", "d", "e"); $size = size(@a); println($size);

Use the &remove function to remove an item from an array.

@array = @("a", "b", "c", "3", "blah", 3, 3.0); remove(@array, 3, "b"); println(@array);

@('a', 'c', 'blah', 3.0)

Arrays returned by built-in functions may be read-only. Functions that try to modify a read-only array will fail with a hard error message. See the documentation for an individual function to find out if the return value is read-only or not.

@files = ls("/"); shift(@files);

Warning: array is read-only at shiftls.sl:2

When in doubt, use &copy to copy a read-only array into something less whiney.

Multidimensional Arrays

Arrays are scalars just like numbers, objects, and strings. Since an array is a scalar that holds other scalars, it stands that an array can also hold other arrays. A multidimensional array is an array of arrays.

@data = @( @("a", "b", "c"), @(1, 2, 3, 4), @('.', '!', '#', '*') );

@data is a multidimensional array.

To access an item from @data:

$temp = @data[2][3]; # $temp is now '*'

This example stacks the index operator against another index operator. The index operator acts on the result of the expression to left of it. You can stack indices as deep as you like. Sleep knows from the variable name to create new empty arrays when you index into dimensions that don't yet exist.

Let us get back to the example. I could have setup the @data array with the following code:

@data[0][0] = "a"; @data[0][1] = "b"; @data[0][2] = "c"; @data[1][0] = 1; @data[1][1] = 2; @data[1][2] = 3; @data[1][3] = 4; @data[2][0] = '.'; @data[2][1] = '!'; @data[2][2] = '#'; @data[2][3] = '*';

Tuple Assignment

Tuple assignments allow you to assign items from an array to individual scalar values. A Sleep tuple is a comma separated list of variable names surrounded by parentheses on the left hand side of an assignment.

($x, $y, $z) = @array;

This example sets $x to the first item in @array, $y to the second item, and $z to the third item.

Tuple assignment sets the remaining scalars to $null when there are not enough items in @array.

Tuple assignment also works with individual values. If the value to assign is not an array, then all scalars in the tuple receive the same value. This form is useful for nulling out multiple values.

($a, $b, $c) = $null;

Assignment operations work with tuple assignment as well.

($x, $y, $z) += 3; # add 3 to $x, $y, and $z

You can also specify an array on the right hand side of a tuple assignment operation. This works as you would expect. Sleep applies the assignment operation to each scalar in the tuple and the corresponding array item.

($x, $y, $z) *= @(2, 3, 4); # $x = $x * 2; $y = $y * 3; etc..

Expand Array

Array expansion is a special case of tuple assignment operations. Wrap a single array scalar within a tuple to expand it. This is the same as typing (@a[0], @a[1], ...). The rest of the tuple assignment rules apply. This has neat implications. For example to add two arrays:

@a = @(1, 2, 3); @b = @(4, 5, 6); (@a) += @b; println("@a is: " . @a);

@a is: @(5, 7, 9)

Sorting Arrays

You can easily sort arrays with any criteria. The &sort function accepts a method for comparing two array items. Sleep also provides &sorta to sort arrays in alphabetical order. &sortn sorts integer or long arrays in numerical order and &sortd sorts double arrays.

sub caseInsensitiveCompare { $a = lc($1); $b = lc($2); return $a cmp $b; } @array = @("zebra", "Xanadu", "ZooP", "ArDvArKS", "Arks", "bATS"); @sorted = sort(&caseInsensitiveCompare, @array); println(@sorted);

@('ArDvArKS', 'Arks', 'bATS', 'Xanadu', 'zebra', 'ZooP')

Arrays: The Truth Revealed

There is something I must confess before we go further. Sleep arrays are not arrays. The Sleep array implementation uses a linked list. A linked list stores values in a chain. Each position knows only the previous and next position. To retrieve a value Sleep must walk through the list, value to value, until it locates the index. In small lists this is not likely to be noticed.

For large lists this can be a real performance bottleneck when not taken into account. Sorting and set operations are not efficient with lists. For these operations Sleep copies the list into the most suitable data structure, works on it, and then places the result into a list.

You may be thinking: "Wow, this is terrible. What possessed you to use linked lists?"

Many of Sleep's operations allow you to add or remove elemements to and from the beginning, end, and middle of an array. Linked lists are excellent for this. Linked lists also scale as large as you like without slowdown for many operations.

3.2 Stacks, Lists, and Sets

I just alluded to the idea that arrays are far more than containers that store their values with a numerical index. In this section I will introduce you to other capabilities of the mighty array.

Stacks

A stack is a first-in last-out data structure. My father pays his bills once a month. He hasn't heard of online bill paying yet. You can tell him. I tried. Anyways, when a bill arrives he places it on his desk on top of a stack of other bills. When payment time comes he reaches for the top bill and starts to work on it. The last bill he placed on the stack is the first one he works on.

You can use Sleep arrays as stacks. The last position in the array is the top of the stack. The first position is the bottom. Use &push to add data to the top position. Also &pop will remove and return the data from the top position.

push(@stack, "apple"); push(@stack, "banana"); push(@stack, "cucumber"); println("Stack is: " . @stack); $value = pop(@stack); println("Top item is: " . $value); println("Stack is: " . @stack);

Stack is: @('apple', 'banana', 'cucumber') Top item is: cucumber Stack is: @('apple', 'banana')

Voila, with this example we have a stack of fruit.

Queues

Similar to stacks are queues. Queues are first-in, first-out data structures. Use &shift to remove and return the first item of an array.

@queue = @("bottom", "middle", "top"); $bottom = shift(@queue); println($bottom); println("Queue is: " . @queue);

bottom Queue is: @('middle', 'top')

Lists

I already discussed linked lists a few sections ago. It would be a crime to not provide some list operations in Sleep. It helps to think of a list as a head (first item) followed by everything else (all items beyond the first). Grab the head of a list by indexing position 0 of an array. Use &sublist to get everything else.

@list = @("a", "b", "c"); # car/head of list... println(@list[0]); # cdr/rest of list println(sublist(@list, 1));

a @('b', 'c')

The &sublist function returns a slice of a list. Changes to a sublist affect the parent list.

@array = @("a", "b", "c", @("dd", "ee", "ff"), "g", "h"); @sub = sublist(@array, 2, 4); # note that an array scalar counts as 1 element. println(@sub); # modifications to the sublist also affect the parent. @sub[1] = "what happened?"; println(@sub); println(@array);

@('c', @('dd', 'ee', 'ff')) @('c', 'what happened?') @('a', 'b', 'c', 'what happened?', 'g', 'h')

@array prior to modifications:

The sublist of @array consisting of positions 2-4 (up to but not including 4):

Sleep creates sublists in linear time. A sublist is nothing more than a view into the parent list.

Sets and Scalar Identity

A set is a collection of values. The order of these values doesn't matter. What matters in sets is membership: testing if a value is in a set or not. A naive algorithm for set membership would simply compare a value to every item in the set. If the value is equal to any of the set members then it is in the set.

Certain set operations are possible with arrays. Sleep uses scalar identity to decide if two scalars are equivalent. The identity algorithm compares references for object scalars and function scalars. The identity function uses the string representation to compare other scalars. You can explicitly compare scalar identity with the =~ predicate.

$ java -jar sleep.jar >> Welcome to the Sleep scripting language > ? 3 =~ "3" true > ? 3 =~ "4" false

Test if a value is a member of an array with the in predicate.

> ? 3 in @("3", "4", "5", "a") true > ? "b" in @("3", "4", "5", "a") false

The union operation places the values of two sets into one set. Values that were in either set are members of the resulting set. Sleep supports the union of two arrays with the &addAll function.

The difference operation removes all values present in one set from another. A value will be a member of only one of the resulting sets. Sleep does set difference with the &removeAll function.

Finally there is the intersect operation. The intersection of two sets is all the values the two sets have in common. &retainAll provides the intersection operation.

@setA = @("apple", "ardvarks", "apes"); @setB = @("bats", "baseballs", "books", "apes"); # union operation: @result = addAll(copy(@setA), @setB); println("A union B: " . @result); # difference operation: @result = removeAll(copy(@setA), @setB); println("A difference B: " . @result); # intersect operation: @result = retainAll(copy(@setA), @setB); println("A intersect B: " . @result);

A union B: @('apple', 'ardvarks', 'apes', 'bats', 'baseballs', 'books') A difference B: @('apple', 'ardvarks') A intersect B: @('apes')

The results of these set operations on @setA and @setB are shown in this Venn diagram:

3.3 Hashes

Hash scalars hold multiple values associated with a key. Use the percent symbol at the beginning of hash names. Sleep uses this type information to create an empty hash when needed.

$x = 3; %foo["name"] = "Raphael"; %foo["job"] = "wasting time"; %foo[$x] = "Michelangelo"; println("%foo is: " . %foo);

%foo is: %(3 => 'Michelangelo', job => 'wasting time', name => 'Raphael')

You can specify hashes in place. The syntax is a percent sign followed by parentheses enclosing a comma separated list of key value pairs. Specify a key value pair with the => operator.

%hash = %(a => "apple", b => "boy", c => 3 * (9 % 7)); println("%hash is: " . %hash);

%hash is: %(a => 'apple', c => 6, b => 'boy')

Hash keys are always converted to strings. For example 3 refers to the same value as "3". Use the index operator to retrieve values from a hash.

%hash = %(a => 'apple', b => 'boy'); %hash["c"] = "cow"; println("%hash is: " . %hash); println("%hash['c'] is: " . %hash["c"]);

%hash is: %(a => 'apple', c => 'cow', b => 'boy') %hash['c'] is: cow

You can also assign hashes to eachother. This is the same as assigning an array to another array. Assigning a hash to another hash copies the reference. Both hash variables refer to the same data.

Some built-in functions return read-only hashes. Functions that try to modify a read-only hash will fail with a hard error message. Use &copy to create a copy of a read-only hash.

To remove a key from a hash set the item to $null or use &removeAt. To remove a value use the &remove function.

Use &keys to get an array of keys in a hash. Hash keys are unordered.

%data = %(a => "AT-ST Walker", b => "bat", c => "cat", d => 43); foreach $var (keys(%data)) { println($var); }

d a c b

Ordered Hashes

Actually, I lied. Not all hashes have unordered keys. These are ordered hashes. Ordered hashes created with &ohash keep track of insertion order. The oldest key is at the beginning of the list and the newest key is at the end. The &ohasha keeps track of access order. The access ordered hash moves keys to the end of the order after each request. With access ordered hashes the first key is the least recently used key.

%random = %(a => "apple", b => "boy", c => "cat", d => "dog"); println("Random: " . %random); %ordered = ohash(a => "apple", b => "boy", c => "cat", d => "dog"); println("Ordered: " . %ordered);

Random: %(d => 'dog', a => 'apple', c => 'cat', b => 'boy') Ordered: %(a => 'apple', b => 'boy', c => 'cat', d => 'dog')

Ordered hashes may have hit and removal policies associated with them. Sleep calls the removal policy prior to adding a new key. The removal policy decides wether to remove the key at the beginning of the list or not. Sleep invokes the miss policy when a key with no value is requested.

%answers = ohash(); setMissPolicy(%answers, { return 42; # default value }); println(%answers); println(%answers["life"]); println(%answers["the universe"]); println(%answers["everything"]); println(%answers);

%() 42 42 42 %(life => 42, the universe => 42, everything => 42)

The removal policy coupled with the access ordered hash makes a great least recently used cache mechanism. We will explore this further in chapter 8.

Mixing Arrays/Hashes

Multidimensional hashes work the same as Sleep arrays. A hash is a scalar that holds other scalar. Hashes can hold arrays, scalars, and other hashes. Arrays can hold hashes as well.

%hash = %(letters => @("a", "b", "c", "d"), names => %( rsm => "Raphael Mudge", fvm => "Frances Mudge") );

Sleep will create a new hash or array when a script tries to index to a nonexistent level. Sleep uses the variable name at the start of the index stack to decide which data structure to create.

The index operator is merely an operator. It operates on values, not variable names. You can apply the index operator to function calls, expressions, and $scalars. Sleep will return $null after trying to index a new level in these contexts.

$temp = split(' ', "A B C")[1]; # $temp is now "B"

Hash Pitfalls

When you request an item from a hash, Sleep checks if it exists. It if does, the value associated with the key is returned. If it doesn't, Sleep adds the specified key to the hash and associates $null with it.

Use the in predicate to check if a key is in a hash. The in predicate will look into the hash and see if a non-$null value exists without adding the key if it doesn't.

Assuming hash gets are a read-only operation is a common pitfall in multi-threaded Sleep applications. Checking if a key is in a hash using the index operator instead of in may lead to memory leaks in your application.

Be aware that &size and &keys loop through your hash and remove keys with $null values. This is transparent to you but you should know that &size executes in linear time based on the number of elements in your hash. Use [[%hash getData] size] to get size of a hash in constant time when this cleanup behavior is not desired.

Using &size in loops may slow down your Sleep programs.

Circular References

Sleep makes some concessions to allow arrays and hashes to reference themselves. These are circular references. You can use this feature to represent graphs in Sleep.

Nodes and edges make a graph. Edges connect one node to another.

The array and hash string description points out a circular dependency with a @ or % followed by a number. Count the number of opening parentheses from the left to find the data structure the number refers to.

# node $n: $n[0] = arbitrary data; $n[1 .. n] = edges sub node { return @($1); } sub add_edge { push($1, $2); } $a = node("a"); $b = node("b"); $c = node("c"); add_edge($a, $b); add_edge($b, $a); add_edge($b, $c); add_edge($b, $b); add_edge($c, $c); println("a: $a"); println("b: $b"); println("c: $c");

a: @('a', @('b', @0, @('c', @2), @1)) b: @('b', @('a', @0), @('c', @2), @0) c: @('c', @0)

This graph is shown here.

Each node is an array. The first item of the array is the value of the node. The rest of the array items represent the outgoing edges from the node.