↑↑↑ Home | ↑↑ UNIX | ↑ Updateware |
YAML is a hierarchical data format similar to the more well known JSON. Compared to JSON, it is more easily readable for humans and can be modified with any text editor, but also more powerful (for example, supporting references). The syntactical difference between YAML and JSON can be compared to that between Python and C, and YAML is accordingly more readable.
This makes YAML a very clear and compatible way of storing nested data structures, in contrast to light-weight databases such as SQLite which are restricted to tables.
yaml, the program distributed on this web page, allows automated processing of data structures from YAML files. It is as yet experimental software, mostly working, but with incompatible changes possible in the future. It corresponds to tools such as xmlstarlet or xmllint that allow automatic XML processing. yaml operates on the array of top-level entries in the YAML file when there are several, or (more commonly) on the array or hash/dictionary contained in its only entry. The sub-commands are described in the documentation below. An especially useful feature that cannot be found elsewhere in a general-purpose program is the depends (and rdepends) command, which converts a list of items and their dependencies into a dependency tree.
yaml is available in a git repository on this server that can be cloned like this:
git clone http://volkerschatz.com/repositories/yaml
yaml is written in Perl and licenced under the Gnu Public License Version 3. Its main dependency is the YAML::Syck Perl module.
The program documentation is here:
usage: yaml <command> <file.yaml> [ <arguments> ... ]
Valid commands are: transform, grep, sort, cmp, extract, depends, rdepends, import, export
Applies the <Perl code> to all top-level data structures and outputs the transformation result in YAML format. A reference to the data structure is passed in $_, and the result has to be passed back in $_.
Filters for data structures for which the <Perl code> evaluates to true and outputs them in YAML format. A reference to the data structure is passed in $_.
Sort top-level data structures using <Perl code> as a comparison expression. The expression must compare $a to $b, as in the code argument for the sort function. If the top-level data structure is an array, $a and $b are array elements; if it is a hash, $a and $b are hash keys, and a copy of the hash is stored in %H.
Compares the nested data structures from the two input files. Depending on the required first argument, a data structure is returned which contains only substructures common to both, unique to the first or the second file. With -d, a new top level contains the results of both -1 and -2. Equivalent structures with different scalar values in them count as unique to both files. Non-trailing array elements with equal values are replaced by null values in the unique result structs.
Outputs an array of subordinate data structures in YAML format. <path> may contain ranges (endpoints separated by "..") or wildcards ("*").
Print a dependency tree in YAML format. Each top-level data structure is denoted by its key or index. The relative <path> describes where to find the sub-structure (scalar, array or hash) that contains their dependency/ies. If any <node>s are specified, only their dependencies are printed.
Similar to the depends command, but prints the reverse dependency tree.
Converts the data from a different file format to YAML. -t allows to force the input file type, otherwise the file extension is used to decide.
XML:
Imports an XML file as nested associative arrays with tag names as keys.
Multiple tags with the same name are represented as an array of associative
arrays. Tag attributes have "@" prepended to their keys. Content-only
tags are represented as simple strings; tags that have both sub-tags or
attributes and text content receive the content in a key that is a single
double quote. Leading and trailing white space is removed from the
content. Requires XML::Parser::Expat and its dependency expat.
HTML containing one or more tables:
Imports all tables in an HTML page as an array of arrays or of hashes (if
table headers are present). The result will likely have to be edited,
because tables are often used for layout and other purposes, so unwanted
arrays are going to end up in the YAML output. Column spanning in headers
concatenates neighbouring cell to form the value of the corresponding hash
key. Column spanning in table data cells across multiple different headers
causes the cell to be copied to all those columns. Multiple header rows at
the top of the table will be concatenated column-wise, and subdivisions in
a following row will cause the common first row header to be copied. When
tables are nested, only the outer table will be reproduced, and the inner
table(s)' cells concatenated. Tables with headers in the first row and
first column are not yet supported; they should be represented as a hash of
hashes.
JSON:
Converted to YAML one-to-one. If available, JSON::XS is used to parse the
JSON input; otherwise YAML::Syck is used, which should work with up-to-date
JSON generators.
CSV:
Comma-separated value table according to RFC 4180. Fields may be quoted by
double quotes, with original double quotes doubled in the quoted string.
Quoting with single quotes or partial quoting of fields is not allowed.
With -H, the first row is taken for table headers, and an array of
associate arrays with these keys is output.
Plain text, assumed to contain a space-separated table:
-H causes the first row to be taken as table headers and an associative
array data structure to be created from each row with those keys. The
0-based index passed after -C denotes the column to be used as a key for
the top-level associative array then created; without -C, a top-level array
is generated instead.
DBF (dBase level 5 database file):
Imports database table as an array of hashes with field names as keys. The
deletion flag does not prevent a record from being imported but is itself
imported as the value of the "_deletion" key. Leading and trailing white
space is stripped from values. Fields of type L (boolean) are converted to
0, 1 or undef; all other fields are imported as the strings that represent
them. Thus fields of type M (strings from memo file) are imported as the
block index only, the DBT file is not parsed.
Converts a YAML file to a different format.
SQLite:
Converts an array of hashes to an SQLite database. This requires the DBI
Perl module. A table named after the input file with columns named after
hash keys will be created and filled. All column types are "numeric",
which stores numerical data in numerical types. Non-scalar values will be
stored as text containing YAML expressions.
Export will be refused if an heuristic decides that the hash keys are too diverse between array entries to make sense as database table columns. The output file may already contain a database, but if a table with the target name exists already, the export is also aborted.