1 #+TITLE: Sixth Data - Data storage and computing engine
3 * (document settings) :noexport:
4 ** use dark style for TWBS-HTML exporter
5 #+HTML_HEAD: <link href="https://bootswatch.com/3/darkly/bootstrap.min.css" rel="stylesheet">
6 #+HTML_HEAD: <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
7 #+HTML_HEAD: <script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.5/js/bootstrap.min.js"></script>
8 #+HTML_HEAD: <style type="text/css">
9 #+HTML_HEAD: footer {background-color: #111 !important;}
10 #+HTML_HEAD: pre {background-color: #111; color: #ccc;}
14 - This is a subproject of [[https://www3.svjatoslav.eu/projects/sixth/][Sixth]]
16 - This program is free software: you can redistribute it and/or modify
17 it under the terms of the [[https://www.gnu.org/licenses/lgpl.html][GNU Lesser General Public License]] as
18 published by the Free Software Foundation, either version 3 of the
19 License, or (at your option) any later version.
23 - Homepage: https://svjatoslav.eu
24 - Email: mailto://svjatoslav@svjatoslav.eu
26 - [[https://www.svjatoslav.eu/projects/][Other software projects hosted at svjatoslav.eu]]
29 - [[https://www2.svjatoslav.eu/gitweb/?p=sixth-data.git;a=snapshot;h=HEAD;sf=tgz][Download latest snapshot in TAR GZ format]]
31 - [[https://www2.svjatoslav.eu/gitweb/?p=sixth-data.git;a=summary][Browse Git repository online]]
33 - Clone Git repository using command:
34 : git clone https://www2.svjatoslav.eu/git/sixth-data.git
36 - See [[https://www3.svjatoslav.eu/projects/sixth-data/apidocs/][JavaDoc]].
40 :ID: f6764282-a6f6-44e6-8716-b428074dd093
42 Provide versioned, clustered, flexible, distributed, multi-dimensional
43 data storage engine for the [[http://www2.svjatoslav.eu/gitbrowse/sixth/doc/index.html][Sixth computation engine]].
45 + Speaking of traditional relational database and object oriented
46 business applications:
48 + I hate object-relational impedance mismatch.
50 + I don't like to convert data between persistent database and
51 runtime objects for every transaction. How about creating united
52 database/computation engine instead to:
54 + Eliminate constant moving and converting of data between 2 systems
55 and make computing happen close to where the data is stored.
57 + Abstract away difference between RAM VS persistent storage. Let
58 the system decide at runtime which data to keep in what kind of
62 + Relational databases:
64 + Indexable / Quickly searchable.
66 + Git (version control system)
68 + Branchable / mergeable.
69 + Transparent cansistency, checksumming and deduplication.
71 https://www.kenneth-truyers.net/2016/10/13/git-nosql-database/ )
75 :ID: d2375acc-af14-4f18-8ad0-7949501178c5
77 + Brain appears to have more than 3D dimensional design:
78 https://singularityhub.com/2017/06/21/is-there-a-multidimensional-mathematical-world-hidden-in-the-brains-computation/
80 + Brain appears to use geometry to map thoughts and even sounds:
81 + https://www.quantamagazine.org/the-brain-maps-out-ideas-and-memories-like-spaces-20190114/
82 + https://www.quantamagazine.org/goals-and-rewards-redraw-the-brains-map-of-the-world-20190328
84 + It directly inspires [[id:171fe375-c737-41e6-b429-a414f6abc5d8][Geometrical computation]] idea and nicely fits
85 with [[id:01aa65c1-3d44-44a8-9b90-58454bc6be80][CM-1 Connection Machine]] design.
87 ** CM-1 Connection Machine
89 :ID: 01aa65c1-3d44-44a8-9b90-58454bc6be80
91 https://en.wikipedia.org/wiki/Connection_Machine
93 + see: [[id:171fe375-c737-41e6-b429-a414f6abc5d8][Geometrical computation]]
95 + Computation unit has local CPU and RAM.
97 + Data is pre-distributed across computation units.
99 + Machine's internal 12-dimensional hypercube network allows to
100 efficiently simulate arbitrary dimensional network topology between
101 computational units. So that when we are solving/simulating for
102 example 5 dimensional problem, we can arrange computational units
103 into virtual 5D network.
106 ** Geometrical computation
108 :ID: 171fe375-c737-41e6-b429-a414f6abc5d8
110 + Inspired by [[id:d2375acc-af14-4f18-8ad0-7949501178c5][Brain]].
111 + Wits nicely with [[id:01aa65c1-3d44-44a8-9b90-58454bc6be80][CM-1 Connection Machine]] properties.
113 *** Distributed computation and data storage
115 :ID: 5d287158-53ea-44a2-a754-dd862366066a
117 Maybe every problem can be translated to geometry (use any shapes and
118 as many dimensions as you need). Solution(s) to such problems would
119 then appear as relatively simple search/comparison/lookup results. As
120 a bonus, such geometrical *data storage* AND *computation* can be
121 naturally made in *parallel* and *distributed*. That's what neurons in
122 the brain appear to be doing ! :) . Learning means building/updating
123 the model (the hard part). Question answering is making (relatively
124 simple) lookups (geometrical queries) against the model.
125 *** Mapping of hyperspace to traditional object-oriented model
127 :ID: a117c11e-97c1-4822-88b2-9fc10f96caec
129 Object oriented programming is inspired by the way human mind
130 operates. It allows programmer to express ideas to computer in a more
133 It is possible to map object model to geometrical hyperspace:
135 + Object is a point in space (universe). Each object member variable
136 translates to its own dimension. That is: if class declares 4
137 variables for an object, then corresponding object can be stored as
138 a single point inside 4 dimensional space. Variable values translate
139 to point coordinates in space. That is: Integer, floating point
140 number and even boolean and string can be translated to linear value
141 that can be used as a coordinate along particular dimension.
143 + Each class declares its own space (universe). All class instances
144 (objects) are points inside that particular universe. References
145 between objects of different types are hyperlinks (portals) between
147 *** Handling of relations
149 :ID: b6b15bd2-c78b-4c51-a343-72843a515c29
151 Consider we want to create database of books and authors. Book can
152 have multiple authors, and single person can be author for multiple
153 books. It is possible to store how many hours of work each author has
154 contributed to every book, using hyperspace as follows:
156 + Every dimension corresponds to one particular book author. (10
157 authors in the database, would require 10 dimensional space)
158 + Point in space corresponds to one particular book.
159 + Point location along particular (author) dimension corresponds
160 to amount of work contributed by particular author for given
165 + Every dimension corresponds to one particular book.
166 + Point in space corresponds to one particular author in the entire
168 + Point location along particular (book) dimension corresponds to
169 amount of work contributed for book by given author (point).
171 ** Layered architecture
172 + layer 1 :: disk / block storage / partition
174 + layer 2 :: key/value storage. Keys are unique and are dictated by
175 storage engine. Value is arbitrary but limited size byte
176 array. This layer is responsible for handling disk
177 defragmentation and consistency in case of crash
180 + layer 3 :: key/value storage. Keys are content hashes. Values are
181 arbitrary but limited size content byte arrays. This
182 layer effectively implements content addressable
183 storage. Content addressible storage enables GIT-like
184 behavior (possibility for competing branches, retaining
185 history, transparent deduplication)
187 + layer 4 :: Implements arbitrary dimensional multiverse.
189 + layer 5 :: Distributed computation engine.
191 - More or less defined [[id:f6764282-a6f6-44e6-8716-b428074dd093][Vision / goal]].
193 - Collected some [[id:d2375acc-af14-4f18-8ad0-7949501178c5][ideas]].
195 - Implemented very simple persistent key-value map.
196 - Long term goal is to use it as a backing storage engine and
197 implement more advanced features on top of this.
200 Interesting or competing projects with good ideas:
202 + [[id:01aa65c1-3d44-44a8-9b90-58454bc6be80][CM-1 Connection Machine]]
205 + database in the form of a knowledge graph that uses machine
206 reasoning to simplify data processing challenges for AI
207 applications. https://grakn.ai/
209 + [[http://wiki.squeak.org/squeak/2665][Magma]]
210 + Multi-user object database for Squeak
212 + [[http://esug.org/data/ESUG2015/3%20wednesday/1100-1130%20SQL%20Queries%20on%20Smalltalk%20Objects/SQL%20Queries%20in%20Smalltalk%20(James%20Foster).pdf][Gemstone/S]]
213 + Completely distributed smalltalk based computing
216 + [[http://www.uruk.org/emu/Taos.html][TAOS]]
217 + Completely distributed operating system/virtual machine:
219 + [[https://github.com/vygr/ChrysaLisp][ChrysaLisp]]
220 + Assembler/C-Script/Lisp 64 bit, MIMD, multi CPU, multi threaded,
221 multi core, multi user Parallel OS. With GUI, Terminal, OO
222 Assembler, Class libraries, C-Script compiler, Lisp interpreter,
223 Debugger, and more...