God, a language for good ol' data

Data serialization can be better, without being too much.

{
  name = "Will";
  age = 26;
  married = false;

  favorite-movies = [
    {
      title = "Interstellar";
      starring = [
        "Matthew McConaughey"
        "Jessica Chastain"
        "Anne Hathaway"
      ];
      director = "Christopher Nolan";
      year = 2014;
    }
    {
      title = "Kill Bill: Volume 1";
      director = "Quinten Tarantino";
      starring = [
        { actor = "Uma Thurman"; character = "The Bride"; }
        { actor = "Lucy Liu"; character = "O-Ren Ishii"; }
        { actor = "David Carradine"; character = "Bill"; }
      ];
      year = 2003;
    }
    {
      title = "The Witch";
      director = "Robert Eggers";
      starring = [ "Anya Taylor-Joy" "Ralph Ineson" ];
      year = 2015;
    }
  ];

  friends = [
    {
      name = "Floyd";
      age = 29;
      married = true;
      favorite-movies = [
        {
          title = "The Departed";
          starring = [ "Leonardo DiCaprio" "Vera Farmiga" "Matt Daemon" ];
          director = "Martin Scorsese";
          year = 2006;
        }
        {
          title = "Shutter Island";
          starring = [ "Leonardo DiCaprio" "Mark Ruffalo" ];
          director = "Martin Scorsese";
          year = 2010;
        }
      ];
      friends = [];
    }
  ];
}

Why?

As someone who has found themself needing to manually write and programatically work with data serialization formats, I wanted a better way. I tried many formats: JSON, YAML, TOML, CSV, XML, KDL, Lua tables, Java properties, and others. You name it, I tried it. Many of them had enough nagging issues to cause motivation in me to find a better format, which never arose.

"But JSON works fine"

Have you ever been in the position of writing JSON, rather than just having a library parse it? If you haven't; then yes that's a logical conclusion. Personally, I find myself in a position where I need to write data manually, and many of the popular formats make that experience have more friction than it should. For those that may need a data serialization format, but never (or rarely) have to deal directly with the data in its storage format, it may seems like nit-picking; however it becomes different when you find yourself manually writing in these formats.

Background

If you feel that God syntax is familiar, that's probably because it is. God isn't a new syntax; it is derived directly from the Nix programming language. Any valid God code can be validated directly by Nix, with nix eval -f file.god. I saw no need to create a new language when I realized Nix had exactly the bones needed to derive a flexible (and easy to understand) data serialization format. God is a subset of Nix which omits it's programming syntax and features in favor of static data representation.

Some of the benefits include:

It can be validated by nix
Conversion from GOD to JSON with nix eval -f file.god --json
A number of existing tools for working with Nix code can be used
- linters and formatters such as statix and nixfmt
- language servers such as nixd, nil and rnix-lsp
- A very thoroughly written Emacs mode

If you would like to see some sample document files, see the examples page.

Examples

These can all be found in the example/ directory of the project's Git repository.

simple.god

{
    name = "Will";
    age = 26;
    numbers = [ 9 -45 3.14 ];
    special = {
        yes = true;
        no = false;
        none = null;
    };
    
    long-string = ''
        Hello
        there!
    '';
}

package.god

{
    name = "shepherd";
    version = "1.0.5";
    licensing = [ "GPL-3.0-or-later" ];
    
    links = {
        home = "https://gnu.org/software/shepherd";
        repo = "https://codeberg.org/shepherd/shepherd.git";
    };
    
    tag = {
        release = true;
        name = "v1.0.5";
    };

    foreign = [
        "usr/share/doc/shepherd-1.0.5"
        "usr/share/guile/site/3.0/shepherd"
        "usr/lib/guile/3.0/site-ccache/shepherd"
        "usr/libexec/shepherd"
    ];
}

types.god

{
    name = "Will";
    nums = [ 1 2 3 true false null "string" ];

    mapping = { age = 26; };

    yes = true;
    no = false;
    nothing = null;

    things = {
        one = true;
        zero = false;
        nada = null;
        list = [ true false null "string" 1 2 3 { map = "self"; catch = 22; lie = true; } ];
    };

    list-of-maps = [
        {
            string-with-escapes = "\"\\there should be a single slash at the beginning when interpreted and this would be entirely quoted and\r\n\tindented on a new line here as well.\"";
            list-within-map-within-list = [ 1 2 3 true false null "\"escaped quotes\"" ];
        }
        {
            more = "less";
        }
    ];

}

directions.god

{
    directions = [
        {
            name = "north";
            cardinal = true;
        }
        {
            name = "east";
            cardinal = true;
        }
        {
            name = "west";
            cardinal = true;
        }
        {
            name = "south";
            cardinal = true;
        }
        {
            name = "down";
            cardinal = false;
        }
    ];
}

deep.god

{
    user = {
        name = "Will";
        age = 26;
        married = false;
        friends = [
            {
                name = "Floyd";
                age = 29;
                married = true;
                favorite-numbers = [ 1 2 -3.14 false null true "Hello!" 69 ];
                qualities = {
                    emotional = [ "patient" 1 "nice" null ];
                };
            }
        ];
    };
}

complex.god

{
    name = "Will";
    age = 26;
    married = false;
    favorite-movies = [
        {
            title = "Interstellar";
            director = "Christopher Nolan";
        }
        {
            title = "Kill Bill Volume 1";
            director = "Quinten Tarantino";
        }
    ];
    friends = [
        {
            name = "Floyd";
            age = 29;
            married = false;
            favorite-movies = [
                {
                    title = "Training Day";
                    director = null;
                }
                {
                    title = "The Departed";
                    director = "Martin Scorcese";
                }
            ];
            friends = [];
        }
    ];
}

string-escapes.god

{
    string = "normal string";
    special-strings = [
        "\"\\entirely quoted with a single slash at the start and\r\n\tnewline + indent here.\""
        "\" \\ this should quoted with slashes on both sides \\ \""
        "\\tabs\t\\and\t\\slashes\t\\with\t\\every\t\\word."
        "\nline-feeds above and below\n"
        "\r\ncarriage-return/line-feeds above and below\r\n"
        "\rcarriage-returns on both sides\r"
    ];
}

Implementations

Guile Scheme: wreedb/guile-god
Tree-Sitter Grammar: wreedb/tree-sitter-god

Note

If you have decided to implement the language, please contact me!

Value Types

File Structure

Values

The value types in GOD are intentionally rudimentary, with the goal of being useful to almost any programming language. They are flexible and have few restrictions.

Index

Strings

Regular

A standard or regular string is represented by a pair of double quotes with any amount of text inside it.

greeting = "Hello, how are you?";

Multi-line

Strings that span across multiple lines are supported. They are define using two sets of single quotes; one at the beginning and one at the end.

about-me = ''
  Let me tell you
  about myself!
'';

Note

Multi-line string are also indentation aware; The indentation of the contained string is calculated relative to the furthest left column which contains meaningful (non-whitespace) text.

my-string = ''
    There are four spaces before this,
      but the they will not be preserved.
'';
# produces:
# "There are four spaces before this,\n  but they will not be preserved.\n"

Escaping

Yog can escape (double) quotes in a regular string using a backslash (\) before it. This is the same for line-feeds, carriage returns and tab characters (\n,\r,\t). To escape any character, prefix it with ''\

height = "6'2\"\n";
# produces:
# 6'2"\n
greeting = ''
  I said ''\'Hello!''\'
    to them.
'';
# produces:
# "I said 'Hello!'"\n  to them."

Warning

The whitespace and newline on the opening line after '' are ignored if there is no meaningful (non-whitespace) text or characters on said initial line. Also, leading tab (\t) characters are not stripped from the beginning of the line, so it is best practice to use spaces within multiline strings unless this is desired.

Numbers

Numbers in God are neither specifically intergers, doubles nor floats. They can be any of them; In Nix, integers have an upper and lower boundary of 9223372036854775807 and -9223372036854775807 respectively, as they are two's complement signed integers. God mimics this behavior, albeit slightly simpler due to all data here being completely static.

age = 26;
age-negative = -26;
pi = 3.14;
pi-negative = -3.14;
exp = 0.27e13;

In practice, a number is a sequence of one or many numeric digits, it may be used with a leading negation operator, and may use lower and upper-case exponent notation.

decimal-exp = -.5e10;
decimal-negative-exp = -.123E-5;

Important

Not all programming languages can represent these limits effectively; therefore the implementer should document any deviations from these limits clearly for their users.

If the technical details needed for proper usage are not documented by the implementation, the implementation MAY NOT claim to be compliant with this specification.

Booleans

Boolean values in God are written as the unquoted text true and false. As in almost any context related to computer science, booleans are a data type used to describe something that has one of two possible values; most commonly true/false and 1/0.

yes = true;
no = false;

Some languages may have unconventional boolean data types, and therefore the implementer may want to use the closest analogue in their language, for example, in Emacs Lisp:

(setq foo t)
(setq bar nil)

There is not false boolean type in the language. There is t to represent a truthy value, and the nil keyword is often used in place of a falsy value.

Warning

Due to the permissiveness of identifiers in God (as well as in Nix), it is completely valid to use true, false and even null as identifiers.

true = false;
false = null;

Using identifiers that use the same name as built-in language types is highly discouraged for obvious reasons, however they are documented here for the purposes of completeness.

Null

Written as the unquoted text null, this value represents nothing.

This may correlate to a languages' null value, or represent the absence of a value in languages which do not have a null type. In some languages this may be best represented by 0 or an empty list ('()) as in some lisp style languages.

nothing = null;

It is at the discretion of the implementer to decide what makes the most sense for their use case and the capabilities of their language.

As described in booleans, null is able to be used as an identifier, though highly discouraged.

Maps

A map in God is a data structure that has an analogue in practically every programming language; Lua tables, Python dictionaries, Perl hash slices, (Type/Java)script objects, Lisp and Scheme association lists, and Java HashMaps just to name a few.

The commonality is the structure of an identifier which is assigned a group of fields. Fields can be seen as key-value pairs in an abstract sense.

In God specifically, they are delimited by opening and closing "curly" braces ({ }). When they are used as just a field, they must be use a termination operator; when used as an element, only any fields within it will require termination.

me = {
    name = "Will";
    age = 26;
    married = false;
    favorite-songs = [
        { artist = "Slint"; title = "Nosferatu Man"; }
        { artist = "OutKast"; title = "Hey Ya!"; }
    ];
    best-friend = {
        name = "Floyd";
        age = 29;
        married = true;
        favorite-songs = [
            { artist = "Tool"; title = "Lateralus"; }
            { artist = "Deafheaven"; title = "Dream House"; }
        ];
    };
};

Any type of field is allowed within a map, so long as it follows any necessary rules of field termination.

Warning

Some languages allow identifiers to be used more than once within a map or similar data structure, with the last occurrence determining its value. This is not valid in God.

me = {
    name =  "Will";
    age = 26;
    age = 25; # this is an ERROR
};

Lists

These are a grouping of any number of elements. They may be the element assigned to a field, or nested within another list as one of its' elements. They are delimited by opening and closing "square" brackets ([ ]). The elements contained are separated by whitespace.

They have no strict type requirements for their elements, meaning it can hold numbers, strings, maps and other lists.

favorite-movies = [
    "Interstellar"
    "The Witch"
    "Kill Bill Vol. 1"
];

list-of-lists = [
    [ 1 2 3 ]
    [ true false null ]
    [ "foo" "bar" ]
    [
        { name = "Will"; age = 26; }
        { name = "Floyd"; age = 29; }
    ]
];

Note

As you can see in the above examples, field termination is required when within the fourth list-of-list element, since maps contain fields; but the maps themselves are only elements, meaning they do not require field termination. For more clarification, see maps and fields.

Structure

Document

The document is used to describe the starting and ending boundaries of data in a God file. It is delimited by opening and closing "curly" braces ({ }). There can only be one set of document-level delimiters in a God file.

{ # document begins here
    name = "Will";
    hobbies = [ "Programming" "Watching Movies" "Playing Video Games" ];
} # document ends here

Within a document, any number of fields are allowed in any order at any level of depth.

Note

The document is not a field, therefore it does not use a field termination operator.

As stated above, only one document-level set of delimiters are allowed, meaning the following example is invalid:

{
    name = "Will";
}
{
    age = 26;
}

Operators

The following symbols are operators in God:

Assignment

This is the operator that denotes an identifier being assigned a value. It is written as an equal sign.

thing = "value";

Termination

These are the integral final part of a field; distinguishing where a given field ends. They are are required to terminate all fields. They are written as a semi-colon.

numbers = { one = 1; two = 2; three = 3; };

Negation

These are used as a prefix to number values; denoting the negativity of the number in question.

negative-numbers = [ -1 -2 -3 -3.14 ];
positive-numbers = [ 1 2 3 3.14 ];

Elements

An element is a raw value, either as a the assigned value of a field, or an item in a list. They can be any of the fundamental data types.

things = [ 
    "string"
    1
    null
    false
    ''
      Multi-line
      string
    ''
    [ "another" "list" ]
    { text = "another map"; }
];

In this example, the following parts are elements:

"string"
1
null
'' ... Multi-line string ... '' (abbreviated)
[ "another" "list" ]
- "another"
- "list"
{ text = "a map"; }
- "a map"

Identifiers

These are unquoted text denoting the name or identity of a field.

Identifiers may not contain any of the following standard symbols:

.: period
%: percent sign
$: dollar sign, as well as other localized currency symbols
@: at sign
!: exclamation point
^: caret
&: ampersand
*: asterisk
": double-quote
`: backtick/grave
~: tilde
[]: opening and closing square brackets
{}: opening and closing curly braces
(): opening and closing parenthesis
,: comma
+: plus/addition sign
=: equal sign
><: angle brackets (greater/lesser than signs)
?: question mark
\/: back and forward slashes
;: semi-colon

Identifiers may not begin with:

-: hypen
': single-quote
0-9: numeric digits

They however, may contain and be suffixed by:

-: hyphen
': unpaired single-quote
_: underscore

Outside of these limits, any amount of digits (0-9), and letters (A-Z, a-z) are allowed.

# containing hyphens/underscores
abc-123 = "fa so la ti do";
abc_123 = null;
    
# suffixed by hyphens/underscores
abc-123- = "fa so la ti do";
abc_123_ = null;

# highly impractical, but valid
a'b'c'1'2'3 = "do re mi";
a_-_b-'_'-c'1_2-'3' = { crazy = true; };

Warning

The use of non-ASCII Unicode characters (emojis, non-Latin characters, accented characters, etc.) in identifiers is invalid. While support for non-English languages would be desirable, implementation difficulty, security concerns, and lack of expertise with non-Latin scripts make this a strict limitation for the foreseeable future.

Fields

These are the most common and fundamental pieces of data in the language.

They are a combination of four parts:

identifier: the name of the field
assignment operator: to assign the name to a value
element: the fields' assigned value
termination: to end the field declaration

name = "Will";
favorite-things = [ "Movies" "Programming" ];
favorite-movie = {
    title = "Interstellar";
    director = "Christopher Nolan";
    release-year = 2014;
    leading-roles = [
        {
            character = "Joseph Cooper";
            actor = "Matthew McConaughey";
        }
        {
            character = "Dr. Amelia Brand";
            actor = "Anne Hathaway";
        }
    ];
};

In this example, there are 11 fields in total. Let's break down the biggest one:

favorite-movie = {
    title = "Interstellar";
    director = "Christopher Nolan";
    release-year = 2014;
    leading-roles = [
        {
            character = "Joseph Cooper";
            actor = "Matthew McConaughey";
        }
        {
            character = "Dr. Amelia Brand";
            actor = "Anne Hathaway";
        }
    ];
};

Here, there are nine total fields including the favorite-movies map. Within it, there are eight fields:

identifier	assigned value
title	string `"Interstellar"`
director	string `"Christopher Nolan"`
release-year	number `2014`
leading-roles	an list containing two maps as elements

The first element of leading-roles:

identifier	assigned value
character	string `"Joseph Cooper"`
actor	string `"Matthew McConaughey"`

The second element of leading-roles

identifier	assigned value
character	string `"Dr. Amelia Brand"`
actor	string `"Anne Hathaway"`

Important

It is an important distiction that the two elements within the leading-roles list are NOT fields; they are elements that contain fields.

Whitespace

The following are considered whitespace in God:

name	value
space characters	`\x20`
tab characters	`\t`
line-feed (LF)	`\n`
carriage-return (CR)	`\r`
comments	Varied

Note

These characters will not be considered whitespace when used within strings. See the warning for specific details related to the handling of multi-line strings.

Comments

Comments in God only come in one variety, line comments:

# this is a line comment, occupying an entire line.
name = "Will"; # this is another line comment, occupying only the end of the line

Despite Nix supporting multi-line and mid-statement comments, God does not. In practice, any meaning that can be conveyed through comments can be conveyed well enough by line comments (whether they occupy a full line or only the end of one), making the other types unnecessary.

Terminology explanation

Terminology regarding comments has garnered a lot of misconception; so in an effort to make it perfectly clear, here are the different types of comments, using C as the language to demonstrate.

Multi-line "Block" comments:

int main() {
    /* this is foo
     * it equals two
     * */
    int foo = 2;
    return 0;
}

Line comments:

int main() {
    // this is foo
    int foo = 2; // it equals two
    return 0;
}

Mid-statement comments:

int main() {
    int /* this is foo, it equals two */ foo = 2;
    return 0;
}

As stated above, God only supports line comments, the other types are invalid.

Keyboard shortcuts

God Specification