#+TITLE: Sixth Data - Data storage and computing engine * (document settings) :noexport: ** use dark style for TWBS-HTML exporter #+HTML_HEAD: #+HTML_HEAD: #+HTML_HEAD: #+HTML_HEAD: * General - This is a subproject of [[http://www3.svjatoslav.eu/projects/sixth/][Sixth]] - This program is free software: you can redistribute it and/or modify it under the terms of the [[https://www.gnu.org/licenses/lgpl.html][GNU Lesser General Public License]] as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. - Program author: - Svjatoslav Agejenko - Homepage: http://svjatoslav.eu - Email: mailto://svjatoslav@svjatoslav.eu - [[http://www.svjatoslav.eu/projects/][Other software projects hosted at svjatoslav.eu]] ** Source code - [[http://www2.svjatoslav.eu/gitweb/?p=sixth-data.git;a=snapshot;h=HEAD;sf=tgz][Download latest snapshot in TAR GZ format]] - [[http://www2.svjatoslav.eu/gitweb/?p=sixth-data.git;a=summary][Browse Git repository online]] - Clone Git repository using command: : git clone http://www2.svjatoslav.eu/git/sixth-data.git * Vision / goal :PROPERTIES: :ID: f6764282-a6f6-44e6-8716-b428074dd093 :END: Provide versioned, clustered, flexible, distributed, multi-dimensional data storage engine for the [[http://www2.svjatoslav.eu/gitbrowse/sixth/doc/index.html][Sixth computation engine]]. + Speaking of traditional relational database and object oriented business applications: + I hate object-relational impedance mismatch. + I don't like to convert data between persistent database and runtime objects for every transaction. How about creating united database/computation engine instead to: + Eliminate constant moving and converting of data between 2 systems and make computing happen close to where the data is stored. + Abstract away difference between RAM VS persistent storage. Let the system decide at runtime which data to keep in what kind of memory. * Inspiration + Relational databases: + Transactional. + Indexable / Quickly searchable. + Git (version control system) + Versionable + Branchable / mergeable. + Transparent cansistency, checksumming and deduplication. + (Git as a database: https://www.kenneth-truyers.net/2016/10/13/git-nosql-database/ ) ** Brain :PROPERTIES: :ID: d2375acc-af14-4f18-8ad0-7949501178c5 :END: + Appears to have more than 3D dimensional design. Food for thought...) + https://singularityhub.com/2017/06/21/is-there-a-multidimensional-mathematical-world-hidden-in-the-brains-computation/ + It directly inspires following ideas + [[id:5d287158-53ea-44a2-a754-dd862366066a][Distributed comutation and data storage]] + [[id:a117c11e-97c1-4822-88b2-9fc10f96caec][Mapping of hyperspace to traditional object-oriented model]] + [[id:b6b15bd2-c78b-4c51-a343-72843a515c29][Handling of relations]] * Ideas ** Distributed computation and data storage :PROPERTIES: :ID: 5d287158-53ea-44a2-a754-dd862366066a :END: Maybe every problem can be translated to geometry (use any shapes and as many dimensions as you need). Solution(s) to such problems would then appear as relatively simple search/comparison/lookup results. As a bonus, such geometrical *data storage* AND *computation* can be naturally made in *parallel* and *distributed*. That's what neurons in the brain appear to be doing ! :) . Learning means building/updating the model (the hard part). Question answering is making (relatively simple) lookups (geometrical queries) against the model. ** Mapping of hyperspace to traditional object-oriented model :PROPERTIES: :ID: a117c11e-97c1-4822-88b2-9fc10f96caec :END: Object oriented programming is inspired by the way human mind operates. It allows programmer to express ideas to computer in a more human-like terms. It is possible to map object model to geometrical hyperspace: + Object is a point in space (universe). Each object member variable translates to its own dimension. That is: if class declares 4 variables for an object, then corresponding object can be stored as a single point inside 4 dimensional space. Variable values translate to point coordinates in space. That is: Integer, floating point number and even boolean and string can be translated to linear value that can be used as a coordinate along particular dimension. + Each class declares its own space (universe). All class instances (objects) are points inside that particular universe. References between objects of different types are hyperlinks (portals) between different universes. ** Handling of relations :PROPERTIES: :ID: b6b15bd2-c78b-4c51-a343-72843a515c29 :END: Consider we want to create database of books and authors. Book can have multiple authors, and single person can be author for multiple books. It is possible to store how many hours of work each author has contributed to every book, using hyperspace as follows: + Every dimension corresponds to one particular book author. (10 authors in the database, would require 10 dimensional space) + Point in space corresponds to one particular book. + Point location along particular (author) dimension corresponds to amount of work contributed by particular author for given book. Alternatively: + Every dimension corresponds to one particular book. + Point in space corresponds to one particular author in the entire database. + Point location along particular (book) dimension corresponds to amount of work contributed for book by given author (point). ** Layered architecture + layer 1 :: disk / block storage / partition + layer 2 :: key/value storage. Keys are unique and are dictated by storage engine. Value is arbitrary but limited size byte array. This layer is responsible for handling disk defragmentation and consistency in case of crash recovery. + layer 3 :: key/value storage. Keys are content hashes. Values are arbitrary but limited size content byte arrays. This layer effectively implements content addressable storage. Content addressible storage enables GIT-like behavior (possibility for competing branches, retaining history, transparent deduplication) + layer 4 :: Implements arbitrary dimensional multiverse. + layer 5 :: Distributed computation engine. * Current status - More or less defined [[id:f6764282-a6f6-44e6-8716-b428074dd093][Vision / goal]]. - Collected some [[id:d2375acc-af14-4f18-8ad0-7949501178c5][ideas]]. - Implemented very simple persistent key-value map. - Long term goal is to use it as a backing storage engine and implement more advanced features on top of this. * See also Interesting or competing projects with good ideas: + GRAKN.AI + database in the form of a knowledge graph that uses machine reasoning to simplify data processing challenges for AI applications. https://grakn.ai/ + Magma + multi-user object database for Squeak http://wiki.squeak.org/squeak/2665 + Gemstone/S + Completely distributed smalltalk based computing system. http://esug.org/data/ESUG2015/3%20wednesday/1100-1130%20SQL%20Queries%20on%20Smalltalk%20Objects/SQL%20Queries%20in%20Smalltalk%20(James%20Foster).pdf + TAOS + Completely distributed operating system/virtual machine: http://www.uruk.org/emu/Taos.html