#+TITLE: Sixth - system for data storage, computation, exploration and interaction ----- - This is a subproject of [[http://www2.svjatoslav.eu/gitbrowse/sixth/doc/index.html][Sixth]] - [[http://www2.svjatoslav.eu/gitweb/?p=sixth-data.git;a=snapshot;h=HEAD;sf=tgz][download latest snapshot]] - This program is free software; you can redistribute it and/or modify it under the terms of version 3 of the [[https://www.gnu.org/licenses/lgpl.html][GNU Lesser General Public License]] or later as published by the Free Software Foundation. - Program author: - Svjatoslav Agejenko - Homepage: http://svjatoslav.eu - Email: mailto://svjatoslav@svjatoslav.eu - [[http://svjatoslav.eu/programs.jsp][other applications hosted at svjatoslav.eu]] * (document settings) :noexport: ** use dark style for TWBS-HTML exporter #+HTML_HEAD: #+HTML_HEAD: #+HTML_HEAD: " #+HTML_HEAD: * Vision / goal Provide versioned, clustered, flexible, object-relational database functionality for the [[http://www2.svjatoslav.eu/gitbrowse/sixth/doc/index.html][Sixth computation engine]]. + I hate object-relational impedance mismatch. + I don't like to convert data between persistent database and runtime objects for every transaction. How about creating united database/computation engine instead to: + Eliminate constant moving and converting of data between 2 systems. + Abstract away difference between RAM VS persistent storage. Let the system decide at runtime which data to keep in what kind of memory. ** Inspiration + Relational databases: + Transactional. + Indexable / Quickly searchable. + Git (version control system) + Versionable + Branchable / mergeable. + Transparent cansistency, checksumming and deduplication. + (Git as a database: https://www.kenneth-truyers.net/2016/10/13/git-nosql-database/ ) ** Solution (the big idea) I see 4D data structure. [[file:data model.png]] Dimensions: + List of all the objecs in the system (rows). + List of all declared unique object fields (columns). + List of all historical transactions/commits/versions (think of sheets of paper). + List of all concurrently running branches/threads. Branches can appear and merge over time as needed. + (Every cell is concrete field value within an object) Partitioning/clustering: + Why not to partition/(load balance) as required across networked physical computers along arbitrary dimension(s) declared above ? Indexing (for fast searching): + Why not to index along arbitrary dimensions (as required) ? Further optimizations: + In current early stage, trying to focus on minimum possible set of features that would provide maximum possible set of power/benefit :) + Once featres are locked. Anything can be optimised. Optimization for size (deduplication) can be solved using Git style content addressible storage mechanism. * Current status - Implemented very simple persistent key-value map. Long term goal is to implement more advanced features on top of this. * TODO ** check out Magma + http://wiki.squeak.org/squeak/2665