X-Git-Url: http://www2.svjatoslav.eu/gitweb/?p=sixth-data.git;a=blobdiff_plain;f=doc%2Findex.html;h=e7a0008425cdc1c7f1bb860c944b0f458563f05f;hp=8793e4565e2e7e39d3822467b63f6e6002838b0f;hb=0c6395b0e8c23f03bd20fc4eef32365987a916bc;hpb=45c3bd94e38fc74268429d4441e730aec28492bf diff --git a/doc/index.html b/doc/index.html index 8793e45..e7a0008 100644 --- a/doc/index.html +++ b/doc/index.html @@ -1,350 +1,620 @@ - - - + + - - - - Sixth - system for data storage, computation, exploration and interaction - - - - - +Sixth Data - Data storage and computing engine + + + + + + + + + + + -
-

Sixth - system for data storage, computation, exploration and interaction

-
-

Table of Contents

- -
-
- - - -
-

1 Current status

-
-
    -
  • Implemented very simple persistent key-value map.
  • -
- -

- Long term goal is to implement more advanced features on top of this. -

-
-
+
+

Sixth Data - Data storage and computing engine

+ +
+

1 General

+
+
-
-

Author: Svjatoslav Agejenko

-

Created: 2016-08-03 Wed 23:45

-

Validate

+ +
+

1.1 Source code

+
+ +
+
+
+ +
+

2 Vision / goal

+
+

+Provide versioned, clustered, flexible, distributed, multi-dimensional +data storage engine for the Sixth computation engine. +

+ +
    +
  • Speaking of traditional relational database and object oriented +business applications: + +
      +
    • I hate object-relational impedance mismatch. +
    • + +
    • I don't like to convert data between persistent database and +runtime objects for every transaction. How about creating united +database/computation engine instead to: +
    • + +
    • Eliminate constant moving and converting of data between 2 systems +and make computing happen close to where the data is stored. +
    • + +
    • Abstract away difference between RAM VS persistent storage. Let +the system decide at runtime which data to keep in what kind of +memory. +
    • +
    +
  • +
+
+
+ +
+

3 Inspiration

+
+
    +
  • Relational databases: +
      +
    • Transactional. +
    • +
    • Indexable / Quickly searchable. +
    • +
    +
  • + +
  • Git (version control system) +
      +
    • Versionable +
    • +
    • Branchable / mergeable. +
    • +
    • Transparent cansistency, checksumming and deduplication. +
    • +
    • (Git as a database: +
    • +
    +

    +https://www.kenneth-truyers.net/2016/10/13/git-nosql-database/ ) +

    +
  • +
+
+ + + +
+

3.2 CM-1 Connection Machine

+
+

+https://en.wikipedia.org/wiki/Connection_Machine +

+ +
    +
  • see: Geometrical computation +
  • + +
  • Computation unit has local CPU and RAM. +
  • + +
  • Data is pre-distributed across computation units. +
  • + +
  • Machine's internal 12-dimensional hypercube network allows to +efficiently simulate arbitrary dimensional network topology between +computational units. So that when we are solving/simulating for +example 5 dimensional problem, we can arrange computational units +into virtual 5D network. See: +http://www.mission-base.com/tamiko/theory/cm_txts/di-ch2.html +
  • +
+
+
+
+ +
+

4 Ideas

+
+
+

4.1 Geometrical computation

+
+ +
+ +
+

4.1.1 Distributed computation and data storage

+
+

+Maybe every problem can be translated to geometry (use any shapes and +as many dimensions as you need). Solution(s) to such problems would +then appear as relatively simple search/comparison/lookup results. As +a bonus, such geometrical *data storage* AND *computation* can be +naturally made in *parallel* and *distributed*. That's what neurons in +the brain appear to be doing ! :) . Learning means building/updating +the model (the hard part). Question answering is making (relatively +simple) lookups (geometrical queries) against the model. +

+
+
+
+

4.1.2 Mapping of hyperspace to traditional object-oriented model

+
+

+Object oriented programming is inspired by the way human mind +operates. It allows programmer to express ideas to computer in a more +human-like terms. +

+ +

+It is possible to map object model to geometrical hyperspace: +

+ +
    +
  • Object is a point in space (universe). Each object member variable +translates to its own dimension. That is: if class declares 4 +variables for an object, then corresponding object can be stored as +a single point inside 4 dimensional space. Variable values translate +to point coordinates in space. That is: Integer, floating point +number and even boolean and string can be translated to linear value +that can be used as a coordinate along particular dimension. +
  • + +
  • Each class declares its own space (universe). All class instances +(objects) are points inside that particular universe. References +between objects of different types are hyperlinks (portals) between +different universes. +
  • +
+
+
+
+

4.1.3 Handling of relations

+
+

+Consider we want to create database of books and authors. Book can +have multiple authors, and single person can be author for multiple +books. It is possible to store how many hours of work each author has +contributed to every book, using hyperspace as follows: +

+ +
    +
  • Every dimension corresponds to one particular book author. (10 +authors in the database, would require 10 dimensional space) +
      +
    • Point in space corresponds to one particular book. +
        +
      • Point location along particular (author) dimension corresponds +to amount of work contributed by particular author for given +book. +
      • +
      +
    • +
    +
  • +
+ +

+Alternatively: +

+ +
    +
  • Every dimension corresponds to one particular book. +
      +
    • Point in space corresponds to one particular author in the entire +database. +
        +
      • Point location along particular (book) dimension corresponds to +amount of work contributed for book by given author (point). +
      • +
      +
    • +
    +
  • +
+
+
+
+ +
+

4.2 Layered architecture

+
+
+
layer 1
disk / block storage / partition +
+ +
layer 2
key/value storage. Keys are unique and are dictated by +storage engine. Value is arbitrary but limited size byte +array. This layer is responsible for handling disk +defragmentation and consistency in case of crash +recovery. +
+ +
layer 3
key/value storage. Keys are content hashes. Values are +arbitrary but limited size content byte arrays. This +layer effectively implements content addressable +storage. Content addressible storage enables GIT-like +behavior (possibility for competing branches, retaining +history, transparent deduplication) +
+ +
layer 4
Implements arbitrary dimensional multiverse. +
+ +
layer 5
Distributed computation engine. +
+
+
+
+
+
+

5 Current status

+
+
    +
  • More or less defined Vision / goal. +
  • + +
  • Collected some ideas. +
  • + +
  • Implemented very simple persistent key-value map. +
      +
    • Long term goal is to use it as a backing storage engine and +implement more advanced features on top of this. +
    • +
    +
  • +
+
+
+ +
+

6 See also

+
+

+Interesting or competing projects with good ideas: +

+ +
    +
  • CM-1 Connection Machine +
  • + +
  • GRAKN.AI +
      +
    • database in the form of a knowledge graph that uses machine +reasoning to simplify data processing challenges for AI +applications. https://grakn.ai/ +
    • +
    +
  • + +
  • Magma +
      +
    • Multi-user object database for Squeak +
    • +
    +
  • + +
  • Gemstone/S +
      +
    • Completely distributed smalltalk based computing +system. +
    • +
    +
  • + +
  • TAOS +
      +
    • Completely distributed operating system/virtual machine: +
    • +
    +
  • + +
  • ChrysaLisp +
      +
    • Assembler/C-Script/Lisp 64 bit, MIMD, multi CPU, multi threaded, +multi core, multi user Parallel OS. With GUI, Terminal, OO +Assembler, Class libraries, C-Script compiler, Lisp interpreter, +Debugger, and more… +
    • +
    +
  • +
+
+
+
+