Input data formats


Welcome     Gallery     Handbook


Manual page for Input_data_formats(PL)

proc getdata is used to read or specify plotting data. proc trailer may be used to place larger amounts of embedded plot data at the end of the script file, to get it out of the way. Ploticus can read tabular data from files, from command results, or data may be embedded in the ploticus script.


Plotting from data fields

Plotting and data display operations are done using fields. Taking a look at the first example data set below, we might draw a bar graph using the values in field 2, and draw error bars using the values in field 3. The bars could be labeled with the values in field 4, or perhaps field 1.

If your data exists in a state such that additional processing is required in order to display it in a desired way, you may be able to manipulate it after it is read by ploticus, using proc processdata , to perform accumulation, tabulation and counting, rewriting as percents, computation of totals, reversing record order, rotation of row/column matrix, break processing, etc.


Recognized data formats

Data files or streams should be plain ascii text, not binary, and should be organized as a collection of rows having one or more fields. Fields may have numeric or alphanumeric content and may be delimited in one of these ways:


  • spacequoted
    	F1 2.43 0.47 "Jane Doe"   PF7955
    	F2 2.79 0.28 "John Smith" PT2705
    	F3 2.62 0.37 "Ken Brown"  PB2702
    	F4 "" "" "Bud Flippner"   PX7205
    
    Fields are delimited by one or more spaces or tabs. Fields may be enclosed in double quotes ("), and such fields may have embedded white space. Blank fields may be represented as shown.


  • whitespace
    	F1 2.43 0.47 Jane_Doe   PF7955   
    	F2 2.79 0.28 John_Smith  PT2705
    	F3 2.62 0.37 Ken_Brown  PB2702
    	F4 - - Bud_Flippner   PX7205
    	...
    
    Fields are delimited by one or more spaces or tabs. No quote processing is done. Blank fields must be represented using a code, and alphanumeric fields cannot contain white space. Parsing of whitespace data is faster than processing of spacequoted data.


  • tab delimited
    	F1	2.43	0.47	Jane Doe
    	F2	2.79	0.28	John Smith
    	F3	2.62	0.37	Ken Brown
    	F4			Bud Flippner
    	...
    
    Fields are separated by a single tab. Zero length fields are taken to be blank. Data fields cannot have embedded tabs. The first field must start at the very beginning of the line. The last field in a row may be terminated by a tab or not.


  • comma delimited
    	"F1",2.43,0.47,"Jane Doe"
    	"F2",2.79,0.28,"John Smith"
    	"F3",2.62,0.37,"Ken Brown"
    	"F4",,,"Hello""world"
    	...
    
    This format, also known as .csv or comma-quote delimited, is often produced by spreadsheets. Fields are separated by commas. Alphanumeric fields are enclosed in double quotes. Zero length fields and fields containing "" are taken to be blank. An embedded double quote is represented using ("") as seen in row F4 above. The first field must start at the very beginning of the line.


Notes:

Data that is specified within a script is subject to script processing: leading white space is stripped off and the script interpreter will attempt to evaluate constructs that look like operators or variables.

Empty rows and commented rows are ignored (the comment marker may be specified via proc getdata ) .

Data sets with variable number of fields may be accomodated by specifying proc getdata attribute nfields. Otherwise, the first usable row will dictate the expected number of fields per record. If a row has more than the expected number of fields, extra fields are silently ignored. If a row has less than the expected number of fields, blank fields are silently added until the record has same number of fields as other records. nfields may also be used to read only the first few fields on every row, and ignore the rest.

Leading white space is allowed when using spacequoted or whitespace delimitation. It is not allowed on the other types.

Each row, including the last one, should be terminated with the standard line terminator for your system. For unix systems this is the newline character. For Win32 it is CR/LF; these are handled properly by MingW builds but not by unix builds.

The data parser was improved for version 2.02; earlier versions did not support zero-length fields or data sets with variable number of fields.


Missing data

Missing data values may be represented using a code or by a zero-length field, if the specific delimitation method allows them. When plotting, missing values are generally skipped over, but exactly what occurs depends on what kind of plot operation is being done. The individual plotting proc manual pages give details.


Embedded #set statements

Data files may contain embedded #set statements for setting ploticus variables directly from the data file. The syntax is:

#set VARIABLE = value.


Examples

Gallery examples include:
scat7.dat (white-space delimited)
stock.csv (comma delimited)
timeline3 (data specified within script)
km2 (data specified within script).


data display engine  
Copyright Steve Grubb


Markup created by unroff 1.0,    December 13, 2001.