The NoSQL system comprises a set of programs called Operators.
Each operator is a separate program module that performs a unique function on the data. Operators can be grouped into data movers, report generators and utilities. The command 'nosql whatis [string]' displays a short description of all the NoSQL operators which name contains string.
The data movers are operators that extract or rearrange the data in some way. They each read a table via STDIN and write a table via STDOUT, and so are frequently connected using the UNIX pipe function to form a larger task. Each operator in such a "pipeline" style of operation gets its input from the output of the previous operator in the "pipeline".
The report generators each read a table via STDIN and produce a report on STDOUT, so when they are in a "pipeline" of operators they will be the operator at the end.
The utilities are used for manipulating the structure and content of tables and for other miscellanous tasks, and are generally used as separate programs, i.e. they do not read STDIN.
Note: although some of the programs have names that are similar to those found in the commercial package /rdb, they do not share any code with the latter, and their behaviour is generally different from that of their /rdb counterparts. A number of operators are implemented simply as calls to ordinary UNIX utilities, for instance:
Same as : tail +3 < table
Same as : sed -n 2p < table
Same as : head -2 < table
Same as : head -1 < table
Same as : cat -vte < table
Once again, this shows how powerful the UNIX operating system already is of its own, and how handy it can be for an add-on package like NoSQL to be able to tap into this power without having to re-invent the wheel.
Invoking NoSQL programs and operators is straightforward :
nosql command [options] [arguments]
where nosql
is the NoSQL front-end driver, and
command
is the NoSQL operator or utility you
want to run, followed by any options and arguments.
Make sure the directory containing the nosql
command is in your PATH (it is usually /usr/local/bin/).
If you run NoSQL interactively from your UNIX shell prompt, or if you use it from within shell scripts, you can make the invocation command much more efficient by including the shell statement :
. /usr/local/lib/nosql/sh/nosqlmain
in your shell ~/.profile and at the beginning of shell scripts
that need to use NoSQL. This will make the nosql()
shell
function known to your login shell or to your scripts respectively,
which means they will be able to invoke the NoSQL front-end without
spawning an additional shell, thus saving a bit of overhead.
NoSQL sets and uses a number of shell environment variables. They are :
Points to the NoSQL installation path on your system, for instance '/usr/local/lib/nosql'.
Points to a Bourne compatible shell to
be used in place of sh
to run NoSQL shell operators.
I personally use (and recommend) Kenneth Almquist's ash
,
which is much smaller than the stock /bin/sh
provided with
Linux (usually a link to /bin/bash
), and is therefore faster
to load. This property is important when pipelining several
operators according to the Operator/Stream paradigm.
If NSQSH is not set, the default is to use sh
.
Points to a POSIX AWK interpreter.
I use (and recommend) Mike Brennan's mawk
, re-compiled
to make it use /bin/ash
for system()
and pipes.
If NSQAWK is not set, the default is to use awk
.
Points to an external utility that NoSQL operators must use to create temporary work files in a safe manner. An excellent temporary file creation utility is tempfile(1), available with Debian GNU/Linux 2.0 and above. The variable, if set, must contain also all the necessary options and arguments for the work file creation utility, i.e. NSQTEMPF="tempfile -m 600", for instance. Another specialized temporary file creator utility is mktemp(1), available with Red Hat and S.u.S.E. Linux. If NSQTEMPF is unset, then the internal NoSQL 'tempfile' shell script is used. I recommend using the aforementioned specialized programs though.
Points to an external utility that NoSQL operators must use to create lock files, according to the NoSQL table locking protocol. A very good locking utility is lockfile(1), normally distributed with the 'procmail' mail filtering program. NSQLOCKER, if set, must contain also all the necessary options and arguments for the lock file creation utility, i.e. NSQLOCKER="lockfile -r1", for instance. If NSQLOCKER is unset, then the internal NoSQL 'lock' script is used, but using lockfile(1) is recommended. See section lock for more details.
Points to your Perl interpreter. Defaults to 'perl' (which must be in your PATH) if unspecified.
All of the above shell environment variables can be set either in the system-wide configuration file $NSQLIB/nosql.conf or in the per-user override file ~/.nosql.conf. The latter overrides the former on a per-user basis. Both files are optional and must follow the usual Bourne shell coding syntax. See file $NSQLIB/nosql.conf.sample for an example configuration. If either files are present but have insecure Unix permissions they are skipped and a warning message is printed to STDERR to let the user know about this security exposure. Both file modes must be no more than 0664, and the per-user file must be owned by the invoking user. If no $NSQLIB/nosql.conf exists yet, the installation process will automatically create one with settings for the most important configuration variables, like NSQSH, NSQTEMPF and NSQLOCKER. These variables will be set according to the utilities that are actually available on your system, i.e. that can be found by the which(1) utility when you run 'make ; make install'.
In order to limit disk access as much as possible across multiple
nosql invocations (i.e. in a pipeline of nosql commands) within
the same shell session, the configuration files are parsed
only once, upon the inclusion of the nosql()
front-end shell function in the working shell. After that, if you make
changes to the files,
they will not become effective until you exit the current shell and
start a new one, or until you re-run the statement
'. $NSQLIB/sh/nosqlmain'. This does not apply if you do not include the
NoSQL front-end in your shell (i.e. '. $NSQLIB/sh/nosqlmain' ),
as in that case you will be using the executable NoSQL wrapper 'nosql',
rather than the 'nosql()' front-end shell function.
The latter approach, however, is less efficient, so I recommend
using the shell function whenever possible. This can be done
also from languages other than the shell. Suppose you want to run
the command 'nosql cat table.rdb | nosql fieldsof' from within a
Perl script, without spawning more than one shell, you can do :
#!/usr/bin/perl open( NOSQL, "| /bin/sh -s" ) || die "\nCan't open pipe to /bin/sh\n" ; print NOSQL ". \${NSQLIB:-/usr/lib/nosql}/sh/nosqlmain ; " . "nosql cat table.rdb | nosql fieldsof\n" ; close( NOSQL ) ;
As always, I recommend using /bin/ash rather than /bin/sh on most Unix systems.
Note: if a variable is not set in any configuration files nor in the environment, NoSQL will try and provide a reasonable default for it. Configuration and default settings can be overridden with new ones by setting them in the current shell before calling the 'nosql()' front-end shell function. This does not apply if NoSQL is called through the executable shell wrapper 'nosql'. In the latter case the calling environment is taken into account only for those variables that are set neither in the system- nor in the user- config files.
For the overrides to be effective they must be set after the inclusion of the nosql() front-end shell function in the current shell, i.e. after the '. $NSQLIB/sh/nosqlmain' statement, and they will last throughout the current shell session until they are overridden again by new settings.
Following is a section for each operator, in alphabetic order.
If you use NoSQL interactively at the shell command line, typing NoSQL commands repeatedly can soon become tedious. To alleviate the problem, you can define shortcuts to the most frequently used commands in your shell ~/.profile. I personally run the Bash shell, and this is what I have in my ~/.profile file :
# Shortcuts to frequently used NoSQL commands.
alias addcol='nosql addcol'
alias col='nosql column'
alias compute='nosql compute'
alias datatype='nosql datatype'
alias etbl='nosql edit'
alias field='nosql field'
alias fieldsof='nosql fieldsof'
alias index='nosql index'
alias inscol='nosql inscol'
alias isl='nosql islist -v'
alias ist='nosql istable -v'
alias jtbl='nosql join'
alias l2t='nosql listtotable'
alias mktbl='nosql maketable'
alias myprog='nosql myprog'
alias null='nosql null -v'
alias pull='nosql pull'
alias record='nosql record'
alias rename='nosql rename'
alias repair='nosql repair'
alias rmcol='nosql rmcol'
alias row='nosql row'
alias setfirst='nosql setfirst'
alias tcat='nosql cat'
alias tsort='nosql sort'
alias subtotal='nosql subtotal'
alias summ='nosql summ'
alias tpl='nosql template'
alias trim='nosql trim'
alias ucfirst='nosql ucfirst'
alias what='nosql whatis'
alias wtbl='nosql write'
alias ptbl='nosql print'
alias search='nosql search'
If you define command shortcuts, make sure that you do not re-define already existing command names. For instance, in most Linux systems there is a col(1) command already, and defining alias col='nosql column' would override the former in interactive shell sessions. I don't care, as I rarely use col(1) interactively, but in any case whenever you define command aliases you have to know what you are doing.