-#+TITLE: Sixth - system for data storage, computation, exploration and interaction
+#+TITLE: Sixth Data - Data storage and computing engine
------
-- This is a subproject of [[http://www2.svjatoslav.eu/gitbrowse/sixth/doc/index.html][Sixth]]
+* (document settings) :noexport:
+** use dark style for TWBS-HTML exporter
+#+HTML_HEAD: <link href="https://bootswatch.com/3/darkly/bootstrap.min.css" rel="stylesheet">
+#+HTML_HEAD: <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
+#+HTML_HEAD: <script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.5/js/bootstrap.min.js"></script>
+#+HTML_HEAD: <style type="text/css">
+#+HTML_HEAD: footer {background-color: #111 !important;}
+#+HTML_HEAD: pre {background-color: #111; color: #ccc;}
+#+HTML_HEAD: </style>
-- [[http://www2.svjatoslav.eu/gitweb/?p=sixth-data.git;a=snapshot;h=HEAD;sf=tgz][download latest snapshot]]
+* General
+- This is a subproject of [[https://www3.svjatoslav.eu/projects/sixth/][Sixth]]
-- This program is free software; you can redistribute it and/or modify
- it under the terms of version 3 of the [[https://www.gnu.org/licenses/lgpl.html][GNU Lesser General Public
- License]] or later as published by the Free Software Foundation.
+- This program is free software: you can redistribute it and/or modify
+ it under the terms of the [[https://www.gnu.org/licenses/lgpl.html][GNU Lesser General Public License]] as
+ published by the Free Software Foundation, either version 3 of the
+ License, or (at your option) any later version.
- Program author:
- Svjatoslav Agejenko
- - Homepage: http://svjatoslav.eu
+ - Homepage: https://svjatoslav.eu
- Email: mailto://svjatoslav@svjatoslav.eu
-- [[http://svjatoslav.eu/programs.jsp][other applications hosted at svjatoslav.eu]]
+- [[https://www.svjatoslav.eu/projects/][Other software projects hosted at svjatoslav.eu]]
+** Source code
+- [[https://www2.svjatoslav.eu/gitweb/?p=sixth-data.git;a=snapshot;h=HEAD;sf=tgz][Download latest snapshot in TAR GZ format]]
-* (document settings) :noexport:
-** use dark style for TWBS-HTML exporter
-#+HTML_HEAD: <link href="https://bootswatch.com/darkly/bootstrap.min.css" rel="stylesheet">
-#+HTML_HEAD: <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.2/jquery.min.js"></script>
-#+HTML_HEAD: <script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.1/js/bootstrap.min.js"></script>"
-#+HTML_HEAD: <style type="text/css">
-#+HTML_HEAD: footer {background-color: #111 !important;}
-#+HTML_HEAD: pre {background-color: #111; color: #ccc;}
-#+HTML_HEAD: </style>
+- [[https://www2.svjatoslav.eu/gitweb/?p=sixth-data.git;a=summary][Browse Git repository online]]
+
+- Clone Git repository using command:
+ : git clone https://www2.svjatoslav.eu/git/sixth-data.git
+
+- See [[https://www3.svjatoslav.eu/projects/sixth-data/apidocs/][JavaDoc]].
* Vision / goal
-Provide versioned, clustered, flexible, object-relational database
-functionality for the [[http://www2.svjatoslav.eu/gitbrowse/sixth/doc/index.html][Sixth computation engine]].
+ :PROPERTIES:
+ :ID: f6764282-a6f6-44e6-8716-b428074dd093
+ :END:
+Provide versioned, clustered, flexible, distributed, multi-dimensional
+data storage engine for the [[http://www2.svjatoslav.eu/gitbrowse/sixth/doc/index.html][Sixth computation engine]].
+
++ Speaking of traditional relational database and object oriented
+ business applications:
+
+ + I hate object-relational impedance mismatch.
-+ I hate object-relational impedance mismatch.
+ + I don't like to convert data between persistent database and
+ runtime objects for every transaction. How about creating united
+ database/computation engine instead to:
+
+ + Eliminate constant moving and converting of data between 2 systems
+ and make computing happen close to where the data is stored.
-+ I don't like to convert data between persistent database and runtime
- objects for every transaction. How about creating united
- database/computation engine instead to:
- + Eliminate constant moving and converting of data between 2 systems.
+ Abstract away difference between RAM VS persistent storage. Let
the system decide at runtime which data to keep in what kind of
memory.
-** Inspiration
+* Inspiration
+ Relational databases:
+ Transactional.
+ Indexable / Quickly searchable.
+ Branchable / mergeable.
+ Transparent cansistency, checksumming and deduplication.
+ (Git as a database:
- https://www.kenneth-truyers.net/2016/10/13/git-nosql-database/ )
-
-** Solution (the big idea)
-I see 4D data structure.
-
-[[file:data model.png]]
-
-Dimensions:
-+ List of all the objecs in the system (rows).
-+ List of all declared unique object fields (columns).
-+ List of all historical transactions/commits/versions (think of
- sheets of paper).
-+ List of all concurrently running branches/threads. Branches can
- appear and merge over time as needed.
-+ (Every cell is concrete field value within an object)
-
-Partitioning/clustering:
-+ Why not to partition/(load balance) as required across networked
- physical computers along arbitrary dimension(s) declared above ?
-
-Indexing (for fast searching):
-+ Why not to index along arbitrary dimensions (as required) ?
-
-Further optimizations:
-+ In current early stage, trying to focus on minimum possible set of
- features that would provide maximum possible set of power/benefit :)
-+ Once featres are locked. Anything can be optimised. Optimization for
- size (deduplication) can be solved using Git style content
- addressible storage mechanism.
+ https://www.kenneth-truyers.net/2016/10/13/git-nosql-database/ )
+
+** Brain
+ :PROPERTIES:
+ :ID: d2375acc-af14-4f18-8ad0-7949501178c5
+ :END:
++ Brain appears to have more than 3D dimensional design:
+ https://singularityhub.com/2017/06/21/is-there-a-multidimensional-mathematical-world-hidden-in-the-brains-computation/
+
++ Brain appears to use geometry to map thoughts and even sounds:
+ https://www.quantamagazine.org/the-brain-maps-out-ideas-and-memories-like-spaces-20190114/
+
+
++ It directly inspires following ideas
+ + [[id:5d287158-53ea-44a2-a754-dd862366066a][Distributed comutation and data storage]]
+ + [[id:a117c11e-97c1-4822-88b2-9fc10f96caec][Mapping of hyperspace to traditional object-oriented model]]
+ + [[id:b6b15bd2-c78b-4c51-a343-72843a515c29][Handling of relations]]
+* Ideas
+** Distributed computation and data storage
+ :PROPERTIES:
+ :ID: 5d287158-53ea-44a2-a754-dd862366066a
+ :END:
+Maybe every problem can be translated to geometry (use any shapes and
+as many dimensions as you need). Solution(s) to such problems would
+then appear as relatively simple search/comparison/lookup results. As
+a bonus, such geometrical *data storage* AND *computation* can be
+naturally made in *parallel* and *distributed*. That's what neurons in
+the brain appear to be doing ! :) . Learning means building/updating
+the model (the hard part). Question answering is making (relatively
+simple) lookups (geometrical queries) against the model.
+** Mapping of hyperspace to traditional object-oriented model
+ :PROPERTIES:
+ :ID: a117c11e-97c1-4822-88b2-9fc10f96caec
+ :END:
+Object oriented programming is inspired by the way human mind
+operates. It allows programmer to express ideas to computer in a more
+human-like terms.
+
+It is possible to map object model to geometrical hyperspace:
+
++ Object is a point in space (universe). Each object member variable
+ translates to its own dimension. That is: if class declares 4
+ variables for an object, then corresponding object can be stored as
+ a single point inside 4 dimensional space. Variable values translate
+ to point coordinates in space. That is: Integer, floating point
+ number and even boolean and string can be translated to linear value
+ that can be used as a coordinate along particular dimension.
+
++ Each class declares its own space (universe). All class instances
+ (objects) are points inside that particular universe. References
+ between objects of different types are hyperlinks (portals) between
+ different universes.
+** Handling of relations
+ :PROPERTIES:
+ :ID: b6b15bd2-c78b-4c51-a343-72843a515c29
+ :END:
+Consider we want to create database of books and authors. Book can
+have multiple authors, and single person can be author for multiple
+books. It is possible to store how many hours of work each author has
+contributed to every book, using hyperspace as follows:
+
++ Every dimension corresponds to one particular book author. (10
+ authors in the database, would require 10 dimensional space)
+ + Point in space corresponds to one particular book.
+ + Point location along particular (author) dimension corresponds
+ to amount of work contributed by particular author for given
+ book.
+
+Alternatively:
+
++ Every dimension corresponds to one particular book.
+ + Point in space corresponds to one particular author in the entire
+ database.
+ + Point location along particular (book) dimension corresponds to
+ amount of work contributed for book by given author (point).
+
+** Layered architecture
++ layer 1 :: disk / block storage / partition
+
++ layer 2 :: key/value storage. Keys are unique and are dictated by
+ storage engine. Value is arbitrary but limited size byte
+ array. This layer is responsible for handling disk
+ defragmentation and consistency in case of crash
+ recovery.
+
++ layer 3 :: key/value storage. Keys are content hashes. Values are
+ arbitrary but limited size content byte arrays. This
+ layer effectively implements content addressable
+ storage. Content addressible storage enables GIT-like
+ behavior (possibility for competing branches, retaining
+ history, transparent deduplication)
+
++ layer 4 :: Implements arbitrary dimensional multiverse.
+
++ layer 5 :: Distributed computation engine.
* Current status
+- More or less defined [[id:f6764282-a6f6-44e6-8716-b428074dd093][Vision / goal]].
+
+- Collected some [[id:d2375acc-af14-4f18-8ad0-7949501178c5][ideas]].
+
- Implemented very simple persistent key-value map.
+ - Long term goal is to use it as a backing storage engine and
+ implement more advanced features on top of this.
+
+* See also
+Interesting or competing projects with good ideas:
+
++ GRAKN.AI
+ + database in the form of a knowledge graph that uses machine
+ reasoning to simplify data processing challenges for AI
+ applications. https://grakn.ai/
+
++ [[http://wiki.squeak.org/squeak/2665][Magma]]
+ + Multi-user object database for Squeak
+
++ [[http://esug.org/data/ESUG2015/3%20wednesday/1100-1130%20SQL%20Queries%20on%20Smalltalk%20Objects/SQL%20Queries%20in%20Smalltalk%20(James%20Foster).pdf][Gemstone/S]]
+ + Completely distributed smalltalk based computing
+ system.
-Long term goal is to implement more advanced features on top of this.
++ [[http://www.uruk.org/emu/Taos.html][TAOS]]
+ + Completely distributed operating system/virtual machine:
-* TODO
-** check out Magma
- + http://wiki.squeak.org/squeak/2665
++ [[https://github.com/vygr/ChrysaLisp][ChrysaLisp]]
+ + Assembler/C-Script/Lisp 64 bit, MIMD, multi CPU, multi threaded,
+ multi core, multi user Parallel OS. With GUI, Terminal, OO
+ Assembler, Class libraries, C-Script compiler, Lisp interpreter,
+ Debugger, and more...