doc/index.org

   1 #+TITLE: Sixth Data - Data storage and computing engine
   2
   3 * (document settings) :noexport:
   4 ** use dark style for TWBS-HTML exporter
   5 #+HTML_HEAD: <link href="https://bootswatch.com/3/darkly/bootstrap.min.css" rel="stylesheet">
   6 #+HTML_HEAD: <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
   7 #+HTML_HEAD: <script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.5/js/bootstrap.min.js"></script>
   8 #+HTML_HEAD: <style type="text/css">
   9 #+HTML_HEAD:   footer {background-color: #111 !important;}
  10 #+HTML_HEAD:   pre {background-color: #111; color: #ccc;}
  11 #+HTML_HEAD: </style>
  12
  13 * General
  14 - This is a subproject of [[https://www3.svjatoslav.eu/projects/sixth/][Sixth]]
  15
  16 - This program is free software: you can redistribute it and/or modify
  17   it under the terms of the [[https://www.gnu.org/licenses/lgpl.html][GNU Lesser General Public License]] as
  18   published by the Free Software Foundation, either version 3 of the
  19   License, or (at your option) any later version.
  20
  21 - Program author:
  22   - Svjatoslav Agejenko
  23   - Homepage: https://svjatoslav.eu
  24   - Email: mailto://svjatoslav@svjatoslav.eu
  25
  26 - [[https://www.svjatoslav.eu/projects/][Other software projects hosted at svjatoslav.eu]]
  27
  28 ** Source code
  29 - [[https://www2.svjatoslav.eu/gitweb/?p=sixth-data.git;a=snapshot;h=HEAD;sf=tgz][Download latest snapshot in TAR GZ format]]
  30
  31 - [[https://www2.svjatoslav.eu/gitweb/?p=sixth-data.git;a=summary][Browse Git repository online]]
  32
  33 - Clone Git repository using command:
  34   : git clone https://www2.svjatoslav.eu/git/sixth-data.git
  35
  36 - See [[https://www3.svjatoslav.eu/projects/sixth-data/apidocs/][JavaDoc]].
  37
  38 * Vision / goal
  39   :PROPERTIES:
  40   :ID:       f6764282-a6f6-44e6-8716-b428074dd093
  41   :END:
  42 Provide versioned, clustered, flexible, distributed, multi-dimensional
  43 data storage engine for the [[http://www2.svjatoslav.eu/gitbrowse/sixth/doc/index.html][Sixth computation engine]].
  44
  45 + Speaking of traditional relational database and object oriented
  46   business applications:
  47
  48   + I hate object-relational impedance mismatch.
  49
  50   + I don't like to convert data between persistent database and
  51     runtime objects for every transaction. How about creating united
  52     database/computation engine instead to:
  53
  54   + Eliminate constant moving and converting of data between 2 systems
  55     and make computing happen close to where the data is stored.
  56
  57   + Abstract away difference between RAM VS persistent storage. Let
  58     the system decide at runtime which data to keep in what kind of
  59     memory.
  60
  61 * Inspiration
  62 + Relational databases:
  63   + Transactional.
  64   + Indexable / Quickly searchable.
  65
  66 + Git (version control system)
  67   + Versionable
  68   + Branchable / mergeable.
  69   + Transparent cansistency, checksumming and deduplication.
  70   + (Git as a database:
  71   https://www.kenneth-truyers.net/2016/10/13/git-nosql-database/ )
  72
  73 ** Brain
  74   :PROPERTIES:
  75   :ID:       d2375acc-af14-4f18-8ad0-7949501178c5
  76   :END:
  77 + Brain appears to have more than 3D dimensional design:
  78   https://singularityhub.com/2017/06/21/is-there-a-multidimensional-mathematical-world-hidden-in-the-brains-computation/
  79
  80 + Brain appears to use geometry to map thoughts and even sounds:
  81   + https://www.quantamagazine.org/the-brain-maps-out-ideas-and-memories-like-spaces-20190114/
  82   + https://www.quantamagazine.org/goals-and-rewards-redraw-the-brains-map-of-the-world-20190328
  83
  84 + It directly inspires [[id:171fe375-c737-41e6-b429-a414f6abc5d8][Geometrical computation]] idea and nicely fits
  85   with [[id:01aa65c1-3d44-44a8-9b90-58454bc6be80][CM-1 Connection Machine]] design.
  86
  87 ** CM-1 Connection Machine
  88 :PROPERTIES:
  89 :ID:       01aa65c1-3d44-44a8-9b90-58454bc6be80
  90 :END:
  91 https://en.wikipedia.org/wiki/Connection_Machine
  92
  93 + see: [[id:171fe375-c737-41e6-b429-a414f6abc5d8][Geometrical computation]]
  94
  95 + Computation unit has local CPU and RAM.
  96
  97 + Data is pre-distributed across computation units.
  98
  99 + Machine's internal 12-dimensional hypercube network allows to
 100   efficiently simulate arbitrary dimensional network topology between
 101   computational units. So that when we are solving/simulating for
 102   example 5 dimensional problem, we can arrange computational units
 103   into virtual 5D network.
 104
 105 * Ideas
 106 ** Geometrical computation
 107 :PROPERTIES:
 108 :ID:       171fe375-c737-41e6-b429-a414f6abc5d8
 109 :END:
 110 + Inspired by [[id:d2375acc-af14-4f18-8ad0-7949501178c5][Brain]].
 111 + Wits nicely with [[id:01aa65c1-3d44-44a8-9b90-58454bc6be80][CM-1 Connection Machine]] properties.
 112
 113 *** Distributed computation and data storage
 114    :PROPERTIES:
 115    :ID:       5d287158-53ea-44a2-a754-dd862366066a
 116    :END:
 117 Maybe every problem can be translated to geometry (use any shapes and
 118 as many dimensions as you need). Solution(s) to such problems would
 119 then appear as relatively simple search/comparison/lookup results. As
 120 a bonus, such geometrical *data storage* AND *computation* can be
 121 naturally made in *parallel* and *distributed*. That's what neurons in
 122 the brain appear to be doing ! :) . Learning means building/updating
 123 the model (the hard part). Question answering is making (relatively
 124 simple) lookups (geometrical queries) against the model.
 125 *** Mapping of hyperspace to traditional object-oriented model
 126    :PROPERTIES:
 127    :ID:       a117c11e-97c1-4822-88b2-9fc10f96caec
 128    :END:
 129 Object oriented programming is inspired by the way human mind
 130 operates. It allows programmer to express ideas to computer in a more
 131 human-like terms.
 132
 133 It is possible to map object model to geometrical hyperspace:
 134
 135 + Object is a point in space (universe). Each object member variable
 136   translates to its own dimension. That is: if class declares 4
 137   variables for an object, then corresponding object can be stored as
 138   a single point inside 4 dimensional space. Variable values translate
 139   to point coordinates in space. That is: Integer, floating point
 140   number and even boolean and string can be translated to linear value
 141   that can be used as a coordinate along particular dimension.
 142
 143 + Each class declares its own space (universe). All class instances
 144   (objects) are points inside that particular universe. References
 145   between objects of different types are hyperlinks (portals) between
 146   different universes.
 147 *** Handling of relations
 148    :PROPERTIES:
 149    :ID:       b6b15bd2-c78b-4c51-a343-72843a515c29
 150    :END:
 151 Consider we want to create database of books and authors. Book can
 152 have multiple authors, and single person can be author for multiple
 153 books. It is possible to store how many hours of work each author has
 154 contributed to every book, using hyperspace as follows:
 155
 156 + Every dimension corresponds to one particular book author. (10
 157   authors in the database, would require 10 dimensional space)
 158   + Point in space corresponds to one particular book.
 159     + Point location along particular (author) dimension corresponds
 160       to amount of work contributed by particular author for given
 161       book.
 162
 163 Alternatively:
 164
 165 + Every dimension corresponds to one particular book.
 166   + Point in space corresponds to one particular author in the entire
 167     database.
 168     + Point location along particular (book) dimension corresponds to
 169       amount of work contributed for book by given author (point).
 170
 171 ** Layered architecture
 172 + layer 1 :: disk / block storage / partition
 173
 174 + layer 2 :: key/value storage. Keys are unique and are dictated by
 175              storage engine. Value is arbitrary but limited size byte
 176              array. This layer is responsible for handling disk
 177              defragmentation and consistency in case of crash
 178              recovery.
 179
 180 + layer 3 :: key/value storage. Keys are content hashes. Values are
 181              arbitrary but limited size content byte arrays. This
 182              layer effectively implements content addressable
 183              storage. Content addressible storage enables GIT-like
 184              behavior (possibility for competing branches, retaining
 185              history, transparent deduplication)
 186
 187 + layer 4 :: Implements arbitrary dimensional multiverse.
 188
 189 + layer 5 :: Distributed computation engine.
 190 * Current status
 191 - More or less defined [[id:f6764282-a6f6-44e6-8716-b428074dd093][Vision / goal]].
 192
 193 - Collected some [[id:d2375acc-af14-4f18-8ad0-7949501178c5][ideas]].
 194
 195 - Implemented very simple persistent key-value map.
 196   - Long term goal is to use it as a backing storage engine and
 197     implement more advanced features on top of this.
 198
 199 * See also
 200 Interesting or competing projects with good ideas:
 201
 202 + [[id:01aa65c1-3d44-44a8-9b90-58454bc6be80][CM-1 Connection Machine]]
 203
 204 + GRAKN.AI
 205   + database in the form of a knowledge graph that uses machine
 206     reasoning to simplify data processing challenges for AI
 207     applications. https://grakn.ai/
 208
 209 + [[http://wiki.squeak.org/squeak/2665][Magma]]
 210   + Multi-user object database for Squeak
 211
 212 + [[http://esug.org/data/ESUG2015/3%20wednesday/1100-1130%20SQL%20Queries%20on%20Smalltalk%20Objects/SQL%20Queries%20in%20Smalltalk%20(James%20Foster).pdf][Gemstone/S]]
 213   + Completely distributed smalltalk based computing
 214     system.
 215
 216 + [[http://www.uruk.org/emu/Taos.html][TAOS]]
 217   + Completely distributed operating system/virtual machine:
 218
 219 + [[https://github.com/vygr/ChrysaLisp][ChrysaLisp]]
 220   + Assembler/C-Script/Lisp 64 bit, MIMD, multi CPU, multi threaded,
 221     multi core, multi user Parallel OS. With GUI, Terminal, OO
 222     Assembler, Class libraries, C-Script compiler, Lisp interpreter,
 223     Debugger, and more...