#+TITLE: Sixth - system for data storage, computation, exploration and interaction
-----
- This is a subproject of [[http://www2.svjatoslav.eu/gitbrowse/sixth/doc/index.html][Sixth]]
- [[http://www2.svjatoslav.eu/gitweb/?p=sixth-data.git;a=snapshot;h=HEAD;sf=tgz][download latest snapshot]]
- This program is free software; you can redistribute it and/or modify
it under the terms of version 3 of the [[https://www.gnu.org/licenses/lgpl.html][GNU Lesser General Public
License]] or later as published by the Free Software Foundation.
- Program author:
- Svjatoslav Agejenko
- Homepage: http://svjatoslav.eu
- Email: mailto://svjatoslav@svjatoslav.eu
- [[http://svjatoslav.eu/programs.jsp][other applications hosted at svjatoslav.eu]]
* (document settings) :noexport:
** use dark style for TWBS-HTML exporter
#+HTML_HEAD:
#+HTML_HEAD:
#+HTML_HEAD: "
#+HTML_HEAD:
* Vision / goal
Provide versioned, clustered, flexible, object-relational database
functionality for the [[http://www2.svjatoslav.eu/gitbrowse/sixth/doc/index.html][Sixth computation engine]].
+ I hate object-relational impedance mismatch.
+ I don't like to convert data between persistent database and runtime
objects for every transaction. How about creating united
database/computation engine instead to:
+ Eliminate constant moving and converting of data between 2 systems.
+ Abstract away difference between RAM VS persistent storage. Let
the system decide at runtime which data to keep in what kind of
memory.
** Inspiration
+ Relational databases:
+ Transactional.
+ Indexable / Quickly searchable.
+ Git (version control system)
+ Versionable
+ Branchable / mergeable.
+ Transparent cansistency, checksumming and deduplication.
+ (Git as a database:
https://www.kenneth-truyers.net/2016/10/13/git-nosql-database/ )
** Solution (the big idea)
I see 4D data structure.
[[file:data model.png]]
Dimensions:
+ List of all the objecs in the system (rows).
+ List of all declared unique object fields (columns).
+ List of all historical transactions/commits/versions (think of
sheets of paper).
+ List of all concurrently running branches/threads. Branches can
appear and merge over time as needed.
+ (Every cell is concrete field value within an object)
Partitioning/clustering:
+ Why not to partition/(load balance) as required across networked
physical computers along arbitrary dimension(s) declared above ?
Indexing (for fast searching):
+ Why not to index along arbitrary dimensions (as required) ?
Further optimizations:
+ In current early stage, trying to focus on minimum possible set of
features that would provide maximum possible set of power/benefit :)
+ Once featres are locked. Anything can be optimised. Optimization for
size (deduplication) can be solved using Git style content
addressible storage mechanism.
* Current status
- Implemented very simple persistent key-value map.
Long term goal is to implement more advanced features on top of this.
* TODO
** check out Magma
+ http://wiki.squeak.org/squeak/2665