Sixth - system for data storage, computation, exploration and interaction
-Table of Contents
--
-
-
- This is a subproject of Sixth - -
- download latest - snapshot - -
- This program is free software; you can redistribute it and/or modify - it under the terms of version 3 of the GNU Lesser General - Public - License or later as published by the Free Software Foundation. - - -
- Program author:
-
-
-
- Svjatoslav Agejenko -
- Homepage: http://svjatoslav.eu/ -
- Email: mailto://svjatoslav@svjatoslav.eu/ -
-
- - other applications hosted at svjatoslav.eu -
1 Current status
--
-
- Implemented very simple persistent key-value map. -
- Long term goal is to implement more advanced features on top of this. -
-Sixth Data - Data storage and computing engine
+ +1 General
+-
+
- This is a subproject of Sixth + + +
- This program is free software: you can redistribute it and/or modify +it under the terms of the GNU Lesser General Public License as +published by the Free Software Foundation, either version 3 of the +License, or (at your option) any later version. + + +
- Program author:
+
-
+
- Svjatoslav Agejenko + +
- Homepage: http://svjatoslav.eu + +
- Email: mailto://svjatoslav@svjatoslav.eu + +
+
+ - Other software projects hosted at svjatoslav.eu + +
Created: 2016-08-03 Wed 23:45
- + +1.1 Source code
+-
+
- Download latest snapshot in TAR GZ format + + +
- Browse Git repository online + + +
- Clone Git repository using command:
+
+git clone http://www2.svjatoslav.eu/git/sixth-data.git + +
+
+
2 Vision / goal
++Provide versioned, clustered, flexible, distributed, multi-dimensional +data storage engine for the Sixth computation engine. +
+ +-
+
- Speaking of traditional relational database and object oriented
+business applications:
+
+
-
+
- I hate object-relational impedance mismatch. + + +
- I don't like to convert data between persistent database and +runtime objects for every transaction. How about creating united +database/computation engine instead to: + + +
- Eliminate constant moving and converting of data between 2 systems +and make computing happen close to where the data is stored. + + +
- Abstract away difference between RAM VS persistent storage. Let +the system decide at runtime which data to keep in what kind of +memory. + +
+
3 Inspiration
+-
+
- Relational databases:
+
-
+
- Transactional. + +
- Indexable / Quickly searchable. + +
+
+ - Git (version control system)
+
-
+
- Versionable + +
- Branchable / mergeable. + +
- Transparent cansistency, checksumming and deduplication. + +
- (Git as a database: + +
+https://www.kenneth-truyers.net/2016/10/13/git-nosql-database/ ) +
+
+
4 Ideas
+4.1 Distributed computation and data storage
++Maybe every problem can be translated to geometry (use any shapes and +as many dimensions as you need). Solution(s) to such problems would +then appear as relatively simple search/comparison/lookup results. As +a bonus, such geometrical *data storage* AND *computation* can be +naturally made in *parallel* and *distributed*. That's what neurons in +the brain appear to be doing ! :) . Learning means building/updating +the model (the hard part). Question answering is making (relatively +simple) lookups (geometrical queries) against the model. +
+4.2 Mapping of hyperspace to traditional object-oriented model
++Object oriented programming is inspired by the way human mind +operates. It allows programmer to express ideas to computer in a more +human-like terms. +
+ ++It is possible to map object model to geometrical hyperspace: +
+ +-
+
- Object is a point in space (universe). Each object member variable +translates to its own dimension. That is: if class declares 4 +variables for an object, then corresponding object can be stored as +a single point inside 4 dimensional space. Variable values translate +to point coordinates in space. That is: Integer, floating point +number and even boolean and string can be translated to linear value +that can be used as a coordinate along particular dimension. + + +
- Each class declares its own space (universe). All class instances +(objects) are points inside that particular universe. References +between objects of different types are hyperlinks (portals) between +different universes. + +
4.3 Handling of relations
++Consider we want to create database of books and authors. Book can +have multiple authors, and single person can be author for multiple +books. It is possible to store how many hours of work each author has +contributed to every book, using hyperspace as follows: +
+ +-
+
- Every dimension corresponds to one particular book author. (10
+authors in the database, would require 10 dimensional space)
+
-
+
- Point in space corresponds to one particular book.
+
-
+
- Point location along particular (author) dimension corresponds +to amount of work contributed by particular author for given +book. + +
+
+ - Point in space corresponds to one particular book.
+
+Alternatively: +
+ +-
+
- Every dimension corresponds to one particular book.
+
-
+
- Point in space corresponds to one particular author in the entire
+database.
+
-
+
- Point location along particular (book) dimension corresponds to +amount of work contributed for book by given author (point). + +
+
+ - Point in space corresponds to one particular author in the entire
+database.
+
4.4 Layered architecture
+-
+
- layer 1
- disk / block storage / partition + + +
- layer 2
- key/value storage. Keys are unique and are dictated by +storage engine. Value is arbitrary but limited size byte +array. This layer is responsible for handling disk +defragmentation and consistency in case of crash +recovery. + + +
- layer 3
- key/value storage. Keys are content hashes. Values are +arbitrary but limited size content byte arrays. This +layer effectively implements content addressable +storage. Content addressible storage enables GIT-like +behavior (possibility for competing branches, retaining +history, transparent deduplication) + + +
- layer 4
- Implements arbitrary dimensional multiverse. + + +
- layer 5
- Distributed computation engine. + +
5 Current status
+-
+
- More or less defined Vision / goal. + + +
- Collected some ideas. + + +
- Implemented very simple persistent key-value map.
+
-
+
- Long term goal is to use it as a backing storage engine and +implement more advanced features on top of this. + +
+
6 See also
++Interesting or competing projects with good ideas: +
+ +-
+
- GRAKN.AI
+
-
+
- database in the form of a knowledge graph that uses machine +reasoning to simplify data processing challenges for AI +applications. https://grakn.ai/ + +
+
+ - Magma
+
-
+
- multi-user object database for Squeak +http://wiki.squeak.org/squeak/2665 + +
+
+ - Gemstone/S
+
-
+
- Completely distributed smalltalk based computing +system. http://esug.org/data/ESUG2015/3%20wednesday/1100-1130%20SQL%20Queries%20on%20Smalltalk%20Objects/SQL%20Queries%20in%20Smalltalk%20(James%20Foster).pdf + +
+
+ - TAOS
+
-
+
- Completely distributed operating system/virtual machine: +http://www.uruk.org/emu/Taos.html + +
+