8. Input / Output

Sleep scripts can interact with a multitude of I/O sources. Open operations return an object scalar that references a sleep.bridges.io.IOObject. Through this chapter I refer to an IOObject as a handle. You can read, write, and manipulate a handle with Sleep's I/O library.

Files

$handle = openf("/etc/passwd"); while $text (readln($handle)) { println("Read: $text"); }

The &openf function returns an I/O handle. The assignment loop reads the contents of the file. Most read functions return $null when there is no data left. This makes assignment loops a great tool for iterating the contents of a handle.

The &openf function can append to or overwrite a file. Prefix the filename with a > to specify overwrite or a >> to specify append. These character sequences have the same meaning as redirect operators in UNIX and Windows command shells.

# overwrite data.txt $handle = openf(">data.txt"); println($handle, "this is some data."); closef($handle);

This example overwrites the contents of data.txt. At the end of the script, &closef closes the handle. &closef is the universal function for closing any I/O source.

Query the open or closed state of a handle with the end-of-file (-eof) predicate.

$handle = openf("/etc/passwd"); # do something... closef($handle); if (-eof $handle) { println("handle is closed!"); }

I/O Errors

$handle = openf("fjkdfdjkslgds"); if (checkError($error)) { println("Could not open file: $error"); }

Could not open file: java.io.FileNotFoundException: /Users/raffi/manual/manual/fjkdfdjkslgds (No such file or directory)

I/O failures are soft errors. They are silent unless you place a &checkError after each call or set &debug level 2 to print errors as they occur.

Data Integrity

Scripts can assure the integrity of data across a stream by calculating a checksum or cryptographic digest. To calculate a digest or checksum on a handle use the &digest or &checksum functions before reading (or writing) data.

# generate an MD5 digest of any file. sub md5 { $handle = openf($1); $digest = digest($handle, "MD5"); # consume the handle skip($handle, lof($1)); closef($handle); $result = unpack("H*", digest($digest))[0]; println("MD5 ( $+ $1 $+ ) = $result"); } md5(@ARGV[0]);

This example is a generic function that mimics the UNIX md5 command. This script calls &digest in 2 different contexts. The first call sets up the digest and specifies the algorithm. The second call gets the bytes representing the MD5 digest of all data that traversed through the handle.

Filesystem

Operating systems vary in their choice of path separation character. Sleep uses the forward slash as a universal separation character. I/O functions substitute the forward slash for the platform specific separation character in filenames.

Sleep keeps track of the current working directory of a script. You can set this value with &chdir. Scripts can get the current directory with &cwd. All file operations take this value into account.

Query the directory structure of the file system with the &ls and &listRoots functions. Use &mkdir to create new directories.

# a functional way to recurse all files. map({ if (-isDir $1) { map($this, ls($1)); } else { println($1); } }, listRoots());

/.DS_Store /.hotfiles.btree /Applications/.DS_Store /Applications/.localized /Applications/Address Book.app/Contents/Info.plist ...

Get a file size with &lof which stands for length of file. Rename files with &rename or do away with them using &deleteFile.

$path = getFileProper("/Users/raffi/", "fizz", "buzz/", "foo.txt"); println($path);

Example: Working with Large Datasets

Ever had to write a script that processes so much data that it eventually runs out of memory? Is your name Marty? If you answered yes to either of these questions then read this secton. Working with large data sets sometimes requires swapping data to the disk and reconstituting it.

This example uses an access ordered hash as a least-recently-used cache for data stored on the file system. The cache flushes the least recently used data to the disk as it grows in size. The cache restores flushed data when it is requested in the future.

debug(7); # if a miss occurs, check if the key is cached # in the current directory and load it. sub missPolicy { local('$handle $data'); if (-exists $2) { $handle = openf($2); $data = readObject($handle); closef($handle); println("--- Loaded $2"); return $data; } return $null; } # if the size of the data structure is over 3 # elements then save it to the disk. sub removalPolicy { local('$handle'); if (size($1) >= 3) { $handle = openf("> $+ $2"); writeObject($handle, $3); closef($handle); println("+++ Saved $2"); return 1; } return 0; } # lets test it out... global('%data'); %data = ohasha(); setMissPolicy(%data, &missPolicy); setRemovalPolicy(%data, &removalPolicy); add(%data, a => "apple", b => "batz", c => "cats"); println(%data); println("Access 'a': " . %data["a"]); println(%data); add(%data, d => "dog"); println(%data); println("Access 'b': " . %data["b"]); println(%data);

$ java -jar sleep.jar cache.sl %(a => 'apple', b => 'batz', c => 'cats') Access 'a': apple %(b => 'batz', c => 'cats', a => 'apple') +++ Saved b %(c => 'cats', a => 'apple', d => 'dog') --- Loaded b +++ Saved c Access 'b': batz %(a => 'apple', d => 'dog', b => 'batz')

This example serializes Sleep data with &readObject and &writeObject. These functions can convert most Sleep data to bytes and dump them to a stream.

As a side note: serialization is sensitive to the version of Sleep and Java you are using. Objects written using Java 1.5 and Sleep 2.1 can be written to and read from eachother. Objects written using Sleep running on top of Java 1.6 may not be compatible with Sleep running on top of Java 1.5.

Console

Most applications have access to a STDIN and STDOUT file to read and write data to the operator console. Sleep is no different. Sleep's I/O functions default to the console when no $handle is specified.

print("What is your name? "); $name = readln(); println("Hello $name $+ , it is a pleasure to meet you.");

Network Client/Server

Writing TCP/IP clients in Sleep is very easy. Simply use the &connect function to establish a connection to a server:

$handle = connect(@ARGV[0], 31337); println($handle, "hello echo server"); $text = readln($handle); println("Read: $text"); closef($handle);

The example above is a client for an echo service. The corresponding server for this echo client is:

$socket = listen(31337, 60 * 1000, $host); println("Received connection from $host"); $text = readln($socket); println("Read: $text"); println($socket, "PONG! $text"); closef($socket);

$ java -jar sleep.jar echosrv.sl Received connection from 127.0.0.1 Read: hello echo server

The first call to &listen will register your script with the operating system as owning the specified network port. Subsequent calls to &listen accept a waiting connection attempt or block until a connection attempt occurs. To stop your application from listening on a port use &closef with the port number as a parameter.

To close the write portion of an I/O handle (causing a potential end-of-file on the other end) use &printEOF.

Use &fork to write multithreaded servers. Polling is also possible. Use &available to check the number of bytes available on a handle.

$handle = connect("www.yahoo.com", 80); println($handle, "GET /"); sleep(3000); println(available($handle) . " bytes are available");

Threads (Pipe I/O)

A thread is an asynchronous unit of execution. Threads in Sleep execute independent of eachother. They can be thought of as separate programs executing independent of one another. Threads do not share data by default. They can share data but then your responsibility as a programmer increases. Sleep threads can communicate with an I/O channel known as a pipe. These topics are covered in this section. First I'd like to go back to network clients and servers.

sub echoClient { $text = readln($socket); println($socket, "back at ya: $text"); closef($socket); } while (1) { $server = listen(8888, 0); fork(&echoClient, $socket => $server); }

$ telnet 127.0.0.1 8888 Connected to localhost. hello world back at ya: hello world Connection closed by foreign host. $ telnet 127.0.0.1 8888 Connected to localhost. uNF back at ya: uNF Connection closed by foreign host.

This example uses a while loop to listen for and accept connections. Each connection receives its own thread.

Sharing Data

&fork accepts any number of key value pairs to share data between the current thread and the new thread. Pass by value and pass by reference rules apply to &fork arguments. If you want to share an updateable value between threads store it in a shared hash or array.

Code that manipulates shared data in a thread is known as a critical section. Semaphores exist to protect critical sections of code. Semaphores are flexible locks with atomic &acquire and &release operations.

Sleep associates a count with each semaphore. This count determines the number of threads that can &acquire the semaphore before a &release. A semaphore with a count of 1 is a binary semaphore. A binary semaphore allows only one thread to &acquire it before a &release.

sub computation { for ($x = 0; $x < 50000; $x++) { acquire($lock); %share["resource"] += $number; release($lock); } } %share["resource"] = 0; $lock = semaphore(1); $a = fork(&computation, \%share, \$lock, $number => 1); $b = fork(&computation, \%share, \$lock, $number => -1); wait($a); wait($b); println(%share["resource"]);

In this example one thread increments and another decrements a shared resource. The protection must work as the end result is zero. Does this protection really matter? I can understand some skepticism. Here is the same example without the protection:

sub computation { for ($x = 0; $x < 50000; $x++) { %share["resource"] += $number; } } %share["resource"] = 0; $a = fork(&computation, \%share, $number => 1); $b = fork(&computation, \%share, $number => -1); wait($a); wait($b); println(%share["resource"]);

Not the expected value is it? The end lesson: protect your shared resources with semaphores!

Inter-thread communication

Like other I/O functions, &fork returns an I/O handle when called. Scripts can write to and read from this handle. But what are they reading from or writing to? &fork creates a global variable $source within each thread. This variable is the other end of the pipe returned by &fork. Data written to $source is read from the fork handle. Data written to the fork handle is read at $source.

sub a { println("This is thread a"); while $value (readln($source)) { println("Read: $value"); } println("done!"); } $handle = fork(&a); println($handle, "hello a"); println($handle, "blah blah"); closef($handle);

Sometimes it is helpful to communicate values between threads. Scripts can only communicate copies as the threads are isolated from eachother. Use &readObject and &writeObject to copy values between threads.

sub a { @data = @("a", "b", "c"); writeObject($source, @data); @stuff = @(1, 2, 3); writeObject($source, @stuff); } $handle = fork(&a); @a = readObject($handle); println("Read array: " . @a); @b = readObject($handle); println("Read array: " . @b);

Connecting two I/O connections

&fork can connect two I/O handles together into a virtual pipe. This example implements a generic TCP/IP client with this technique:

debug(7); global('$host $port $socket'); sub handler { local('$text'); while $text (readln($src)) { println($dst, $text); } closef($dst); } # obtain our host, port from the command line arguments ($host, $port) = @ARGV; # connect to the desired host:port combination $socket = connect($host, $port); # fork the reader for the socket; prints all output to the console fork(&handler, $src => $socket, $dst => getConsole()); # fork the reader for the console; prints all output to the socket fork(&handler, $src => getConsole(), $dst => $socket);

$ java -jar connect.sl irc.blessed.net 6667 USER a b c :the phanton menace NICK rawClient PING :F826A790 PONG :F826A790 :irc.blessed.net 001 rawClient :Welcome to the EFNet Internet Relay Chat Network rawClient JOIN #jircii :[email protected] JOIN :#jircii :irc.blessed.net 332 rawClient #jircii :http://jircii.hick.org b42 (11.26.07) released! | http://blog.printf.no | sign up for the sleep google group: http://sleep.hick.org/ :irc.blessed.net 353 rawClient = #jircii :rawClient ceelow Drakx_ ph @Drakx[L] @Drakx @[Serge] @strider_ @seph_ @ph__ @ph____ @iHTC @`butane :irc.blessed.net 366 rawClient #jircii :End of /NAMES list. PRIVMSG #jircii :hi QUIT :good bye! :[email protected] QUIT :Client Quit ERROR :Closing Link: cpe-72-226-177-132.twcny.res.rr.com (Client Quit)

Waiting for a thread to complete

Sometimes it is helpful to spin off a new thread of execution, do some stuff in the current thread, and then wait for the new thread to complete. This is a join operation. Use &wait to join two threads. The &wait function will block until the thread completes or $source is closed.

sub factorial { sub calculateFactorial { return iff($1 == 0, 1, $1 * calculateFactorial($1 - 1)); } $result = calculateFactorial($value); println("fact( $+ $value $+ ) is ready"); return $result; } $fact12 = fork(&factorial, $value => 120.0); $fact11 = fork(&factorial, $value => 110.0); $fact10 = fork(&factorial, $value => 100.0); println("fact(120) = " . wait($fact12)); println("fact(110) = " . wait($fact11)); println("fact(100) = " . wait($fact10));

fact(120.0) is ready fact(110.0) is ready fact(100.0) is ready fact(120) = 6.689502913449124E198 fact(110) = 1.5882455415227421E178 fact(100) = 9.33262154439441E157

Avoiding closure deadlock between threads

If you create a thread, pass a closure to it, and execute that closure--your program will likely deadlock. This deadlock happens because the thread where you created the new thread from is still executing and it owns the closure you passed. Sleep synchronizes closures so only one can execute at a time per environment. This is why each thread must execute in its own environment.

If you want to copy a closure to a new script environment use &writeObject to the thread $handle and read it from $source. Sleep closures including continuations and coroutines can be copied this way.

Another trick: you may create a new thread, generate any objects that you want to share, and return them in the thread. You can obtain these returned values using the &wait function. This is useful for sharing ordered hashes with associated policies between threads. You still have to take care that the hash is protected with synchronization when a write may occur but this at least protects you from closure deadlock.

sub createCache { local('%cache'); %cache = ohash(); setMissPolicy(%cache, { return 42; }); return %cache; } # obtain a ordered hash whose closures are not # associated with any other script environment. # Protects against deadlock :) %data = wait(fork(&createCache));

External Programs

Interacting with an external program is very similar to interacting with a thread. The &exec function returns a handle that acts as a pipe between Sleep and an external program. This pipe provides access to the program's standard input (STDIN) and standard output (STDOUT).

$handle = exec("./printargs apple boy charlie"); printAll(readAll($handle)); closef($handle);

&exec accepts a command in two forms. The first form is a simple string. If a string is provided, Sleep will tokenize the string using whitespace as the delimeter. &exec uses the first token as the command and the subsequent tokens as individual arguments.

&exec also accepts an array instead of a string. The first element of the array is the command. The other elements represent the arguments. These arguments may contain whitespace or any other special characters.

$handle = exec(@("./printargs", "Hello world", "I have spaces")); printAll(readAll($handle)); closef($handle);

STDERR (and other manipulations of IOObject)

STDERR is the standard error stream. Sleep does not provide a function for obtaining a handle to STDERR. This is not an issue because you can get a handle for STDERR with a few object expressions.

Most handles have a source associated with them. Use the getSource method of IOObject to obtain the source backing the handle.

$handle = openf("a.txt"); $source = [$handle getSource]; $class = [$source getClass]; println("Source of \$handle is $class");

Scripts can fuse an arbitrary InputStream or OutputStream into an I/O handle. Do this with the getIOHandle method in the sleep.runtime.SleepUtils class. This next example illustrates this technique.

$handle = exec("./printargs"); sub processStderr { $source = [$handle getSource]; # java.lang.Process $stream = [$source getErrorStream]; # java.io.InputStream $stderr = [SleepUtils getIOHandle: $stream, $null]; while $error (readln($stderr)) { println("[stderr] $error"); } } # process stderr in a new thread fork(&processStderr, \$handle); printAll(readAll($handle));

Backtick Expressions

Scripts can execute a process with a string enclosed in backticks. The interpreter evaluates a backtick as a parsed literal and then executes the resulting string. The output of the execution is returned as an array.

# recursively find all files @files = `find .`; # print out the number of files we found println("There are " . size(@files) . " files here. :)");

Buffers

For times when speed is a necessity Sleep provides the friendly byte buffer. A buffer is a segment of memory that scripts can write to (and eventually read from) using Sleep's I/O functions. Buffers are fast. If there is a need to concatenate lots of data or to manipulate streams of data then buffers are a must.

To allocate a buffer use the &allocate function. Buffers are write-only once allocated.

&closef on a write-only buffer will make the contents available for reading. Consequently the buffer is read-only from this point.

The following program encrypts a file using a simple XOR encryption scheme. It uses the allocated buffer as a convienent place to hold data before dumping it to a file.

# read the file in $input = openf(@ARGV[0]); $data = readb($input, -1); closef($input); # encrypt the contents of the file... $buffer = allocate(strlen($data)); for ($x = 0; $x < strlen($data); $x++) { writeb($buffer, chr(byteAt($data, $x) ^ 0x34)); } closef($buffer); # buffer is readable now.. $data = readb($buffer, strlen($data)); # write the file out $output = openf(">" . @ARGV[0]); writeb($output, $data); closef($output);

$ cat >contents.txt pHEAR the reapz0r $ java -jar sleep.jar encryptxor.sl contents.txt $ cat contents.txt D|quf@\QFQUDNF> $ java -jar sleep.jar encryptxor.sl contents.txt $ cat contents.txt pHEAR the reapz0r

8.2 String I/O

An important distinction between String I/O and Binary I/O is the interpretation of the data. Sleep's string I/O functions are unicode aware.

Many programmers are comfortable with ASCII. ASCII is a common agreement on what characters are represented by the integers 0-127. Most times an ASCII character is stored in 8-bits. This leaves 128-255 open for interpretation. Different "character sets" evolved over time. Some character sets used these high-ascii characters to represent accented characters for other languages. My favorite, Cp437, used them to represent line drawing characters in a terminal.

This use of ASCII as a string representation has some limitations. For example, applications are limited to 128 extra characters. This is not enough for some languages. The other problem is a lack of a way to identify which character set is in use.

To solve these problems unicode was invented. Unicode is a standard universal mapping for all known characters you would ever care to represent on a screen.

Java uses UTF-16 to store unicode characters in memory. This means each character is represented with 2 bytes. &readc consumes 2-bytes from a handle to read a UTF-16 character. Other functions such as &print, &println, &readln, etc. read and write 1 byte ASCII characters. These functions rely on a character set encoding to remap the extended characters.

Your platform has a default encoding associated with it. For the most part this is transparent to you and you'll never care about the distinction. You can use &setEncoding to specify which encoding to use when reading from or writing to a handle.

&pack, &unpack, &bwrite, and &bread can read and write strings of 16-bit characters. The size difference between encoded characters and UTF-16 is demonstrated below:

# encoded output $handle_a = openf(">a.txt"); println($handle_a, "apple"); closef($handle_a); println("Encoded output: " . lof("a.txt") . " bytes"); # UTF-16 output $handle_b = openf(">b.txt"); bwrite($handle_b, "u", "apple"); closef($handle_b); println("UTF-16 output: " . lof("b.txt") . " bytes");

8.3 Binary I/O

Scripts can read and write bytes with &readb and &writeb. Sleep stores sequences of bytes in string scalars. This means you can use &substr, &strlen, and other string manipulation functions on binary data. Use &byteAt to get a byte from a string.

# simple file copy program. # java -jar sleep.jar copy.sl <target> <destination> ($target_f, $dest_f) = @ARGV; $handle = openf($target_f); $data = readb($handle, -1); closef($handle); $handle = openf("> $+ $dest_f); writeb($handle, $data); closef($handle);

This example is a file copy script. Nothing too special here. This script opens a file, reads all of its contents in, and writes them out.

Interpreting Bytes

Computers store all data as 1s and 0s at some point. Since computers store all data this way, it makes sense that any data can be interpreted in different ways.

Above we have a 32 bit string of 1s and 0s. When interpreted as an integer this string has the value 3,232,235,777. Let us interpret this same string as 4 separate bytes:

This is the same 32bit string. Each byte contains 8 bits so the whole string yields 4 bytes. The value of each is 192, 168, 1, and 1.

$ java -jar sleep.jar >> Welcome to the Sleep scripting language > interact >> Welcome to interactive mode. Type your code and then '.' on a line by itself to execute the code. Type Ctrl+D or 'done' on a line by itself to leave interactive mode. $bytes = pack("I", 3232235777L); @bytes = unpack("B4", $bytes); println(@bytes); . @(192, 168, 1, 1) println(formatNumber( 3232235777L, 10, 2)); . 11000000101010000000000100000001

Pack and Unpack

&pack and &unpack condense and extract Sleep types to and from byte strings. These functions interpret data as specified in a template.

&pack accepts a template and a comma separated list of items to pack into a byte string. &unpack accepts a byte string and extracts items according to the template.

Character	Bytes	Description
b	1	byte (-128 to 127) (converted to/from a sleep int)
B	1	unsigned byte (0 to 255) (converted to/from a sleep int)
c	2	UTF-16 Unicode character
C	1	normal character
d	8	double (uses IEEE 754 floating-point "double format" bit layout)
f	4	float (uses IEEE 754 floating-point "single format" bit layout)
h	1	a hex byte (low nybble first)
H	1	a hex byte (high nybble first)
i	4	integer
I	4	unsigned integer (converted to/from a sleep long)
l	8	long
M	0	mark this point in the IO stream (for reads only)
o	variable	sleep scalar object (used to serialize/deserialize scalars)
R	0	reset this stream to the last mark point (reads only)
s	2	short (converted to/from a sleep int)
S	2	unsigned short (converted to/from a sleep int)
u	variable	read/write UTF-16 character data until terminated with a null byte. (see note below)
U	variable	read/write the specified number of UTF-16 characters (consumes the whole field)
x	1	skips a byte/writes a nully byte in/to this stream (no data returned)
z	variable	read/write character data until terminated with a null byte. (see note below)
Z	variable	read/write the specified number of characters (consumes the whole field)

Follow up any of the template characters with an integer to repeat that element some number of times. Use a * to indicate that all remaining data should be interpreted with the most recent character. Whitespace is ignored inside of template strings.

Network byte (big endian) order is the default for all reads/writes. Scripts can indicate endianess if they choose. A + appended to a template character indicates big endian. A - appended to a character indicates little endian. The ! indicates the platform native byte order.

$ java -jar sleep.jar >> Welcome to the Sleep scripting language > x iff(unpack('i!', pack('i+', 1))[0] == 1, "big endian", "little endian") big endian $ uname -a Darwin beardsley.local 8.8.0 Darwin Kernel Version 8.8.0: Fri Sep 8 17:18:57 PDT 2006; root:xnu-792.12.6.obj~1/RELEASE_PPC Power Macintosh powerpc

Interoperability with C

One powerful feature of pack and unpack is interoperability with C. Think of a pack and unpack template as the definition of a C struct. Each template character represents a member of the struct.

This C program creates two records and serializes them to a file (and you thought this was a Sleep/Java book):

For my next magic trick, I will use Sleep to extract this information and display it right before your very eyes.

The first step is to create a template that represents the following C structure.

This structure contains a 30 character string, an int field, and a date value stored as a long.

This script uses &bread to read each struct entry into an array. This script multiplies the extracted time by 1000 to convert it to milliseconds. Sleep represents date time values in milliseconds.

$ java -jar sleep.jar interop.sl Name: Raphael Age: 26 Created: Sun, 17 Jun 2007 23:07:21 -0400 Name: Frances Age: 25 Created: Sun, 17 Jun 2007 23:07:21 -0400

8.1 I/O Handles