From 1c5ed72204c3c535b06b2215c50d54fbdb18ee98 Mon Sep 17 00:00:00 2001 From: Svjatoslav Agejenko Date: Wed, 29 Jan 2020 22:51:40 +0200 Subject: [PATCH 1/1] Updated vision --- doc/index.html | 316 +++++++++++++++++------------------------- doc/index.org | 211 +++++++++++++--------------- tools/update web site | 4 +- 3 files changed, 224 insertions(+), 307 deletions(-) diff --git a/doc/index.html b/doc/index.html index 6d728f3..9b75295 100644 --- a/doc/index.html +++ b/doc/index.html @@ -2,7 +2,7 @@ Sixth Data - Data storage and computing engine - + @@ -201,7 +201,8 @@ $(function() {

1 General

@@ -254,89 +252,54 @@ git clone https://www2.svjatoslav.eu/git/sixth-data.git

2 Vision / goal

-Provide versioned, clustered, flexible, distributed, multi-dimensional -data storage engine for the Sixth computation engine. +Provide hackable, versioned, optimized, distributed, geometrical, +arbitrary dimensional (hypercube based) data storage and computation +engine (as inspired by the brain) for general purpose visual computing +environment called Sixth.

- +

+Because Lisp is hackable self defined programmable programming +language it would be used to provide imperative programming support. +

-
-

3 Inspiration

+

3 Inspiration

-

3.1 Brain

@@ -349,60 +312,77 @@ with CM-1 Connection Machine design. https://en.wikipedia.org/wiki/Connection_Machine

-
    -
  • see: Geometrical computation -
  • - -
  • Computation unit has local CPU and RAM. -
  • - -
  • Data is pre-distributed across computation units. -
  • - -
  • Machine's internal 12-dimensional hypercube network allows to -efficiently simulate arbitrary dimensional network topology between -computational units. So that when we are solving/simulating for -example 5 dimensional problem, we can arrange computational units -into virtual 5D network. See: +

    +Massively parallel (thousands of CPUs) connected via +machine's internal 12-dimensional hypercube network allows to +efficiently simulate arbitrary dimensional hypercube and network +topology between computational units. So that when we are +solving/simulating for example 5 dimensional problem, we can arrange +computational units into virtual 5D network. See: http://www.mission-base.com/tamiko/theory/cm_txts/di-ch2.html -

  • -
+

+ +

+we can pre-distribute data across computation units and perform +parallel geometrical computation. +

-

4 Ideas

+

4 Reasons for hypercube as a so called first class citizen

-
-

4.1 Geometrical computation

-
    -
  • Inspired by Brain. +
  • Hypercube is quite general purpose data structure that naturally +encapsulates wide variety data and problems.
  • -
  • Wits nicely with CM-1 Connection Machine properties. + +
  • Nicely captures apparent properties of the brain. +
  • + +
  • Naturally supports distributed and parallel geometrical data storage +and computation. +
  • + +
  • Dedicated hardware like CM-1 can be built around hypercube concept +that results in data, computation process and hardware, all +beautifully fitting together while complementing each other +strengths. +
  • + +
  • Hypercube stored data (and computation process) has geometry by its +nature and should fit nicely with "3D first" user interface ideology +of the parent Sixth project.
+
+
+

5 Geometrical computation idea

+
+
+
+

5.1 Distributed computation and data storage

+
+

+Lots of problems can be translated to geometry (use any shapes and as +many dimensions as you need). Solution(s) to such problems could be +then found via geometrical search/comparison/lookup results. As a +bonus, such geometrical *data storage* AND *computation* can be +naturally made in *parallel* and *distributed*. +

-
-

4.1.1 Distributed computation and data storage

-

-Maybe every problem can be translated to geometry (use any shapes and -as many dimensions as you need). Solution(s) to such problems would -then appear as relatively simple search/comparison/lookup results. As -a bonus, such geometrical *data storage* AND *computation* can be -naturally made in *parallel* and *distributed*. That's what neurons in -the brain appear to be doing ! :) . Learning means building/updating -the model (the hard part). Question answering is making (relatively -simple) lookups (geometrical queries) against the model. +Learning means building/updating/re-balancing the model (the hard +part). Question answering is making (relatively simple) lookups +(geometrical queries) against the model.

-
-

4.1.2 Mapping of hyperspace to traditional object-oriented model

-
+
+

5.2 Mapping hypercube to object-oriented model and relational database

+

Object oriented programming is inspired by the way human mind operates. It allows programmer to express ideas to computer in a more @@ -410,120 +390,78 @@ human-like terms.

-It is possible to map object model to geometrical hyperspace: +It is actually also possible to map object model and relational +database to geometrical hyperspace:

    -
  • Object is a point in space (universe). Each object member variable -translates to its own dimension. That is: if class declares 4 -variables for an object, then corresponding object can be stored as -a single point inside 4 dimensional space. Variable values translate -to point coordinates in space. That is: Integer, floating point -number and even boolean and string can be translated to linear value -that can be used as a coordinate along particular dimension. +
  • Object or database table row is a point in hypercube arbitrary +dimensional space. Each object member variable or database table +column can be mapped to its own dimension in hypercube. That is: if +class declares 4 variables for an object, then corresponding object +can be stored as a single point inside 4 dimensional +hypercube. Variable values translate to point coordinates in that +hypercube. That is: numbers and string can be translated to linear +value that can be used as a coordinate along particular dimension.
  • -
  • Each class declares its own space (universe). All class instances -(objects) are points inside that particular universe. References -between objects of different types are hyperlinks (portals) between -different universes. +
  • Each object class or database table declares its own hypercube that +contain instances (objects) of that class or rows of a table.
-
-

4.1.3 Handling of relations

-
+ +
+

5.3 Mapping entity relations in hypercube

+

-Consider we want to create database of books and authors. Book can -have multiple authors, and single person can be author for multiple -books. It is possible to store how many hours of work each author has -contributed to every book, using hyperspace as follows: +Consider we want to create database of:

-
    -
  • Every dimension corresponds to one particular book author. (10 -authors in the database, would require 10 dimensional space) -
      -
    • Point in space corresponds to one particular book. -
        -
      • Point location along particular (author) dimension corresponds -to amount of work contributed by particular author for given -book. +
      • Books.
      • -
      +
    • Authors.
    • -
    +
  • Effort: Amount of time contributed by every author to every book +that he/she wrote.

-Alternatively: +Information above can be represented as 3D cube where dimensions are:

- -
    -
  • Every dimension corresponds to one particular book.
      -
    • Point in space corresponds to one particular author in the entire -database. -
        -
      • Point location along particular (book) dimension corresponds to -amount of work contributed for book by given author (point). +
      • X: Book
      • -
      +
    • Y: Author
    • -
    +
  • Z: Effort
-
-
-
-
-

4.2 Layered architecture

-
-
-
layer 1
disk / block storage / partition -
- -
layer 2
key/value storage. Keys are unique and are dictated by -storage engine. Value is arbitrary but limited size byte -array. This layer is responsible for handling disk -defragmentation and consistency in case of crash -recovery. -
- -
layer 3
key/value storage. Keys are content hashes. Values are -arbitrary but limited size content byte arrays. This -layer effectively implements content addressable -storage. Content addressible storage enables GIT-like -behavior (possibility for competing branches, retaining -history, transparent deduplication) -
- -
layer 4
Implements arbitrary dimensional multiverse. -
- -
layer 5
Distributed computation engine. -
-
+

+Points in that cube would nicely capture many to many relations +between authors and the books. +

-
-

5 Current status

-
+
+

6 Current status

+
  • More or less defined Vision / goal.
  • -
  • Collected some ideas. +
  • Collected some inspiring ideas.
  • Implemented very simple persistent key-value map.
    • Long term goal is to use it as a backing storage engine and -implement more advanced features on top of this. +implement more advanced features on top of this via layered +architecture.
  • @@ -531,9 +469,9 @@ implement more advanced features on top of this.
- diff --git a/doc/index.org b/doc/index.org index 3b8accc..88a1d75 100644 --- a/doc/index.org +++ b/doc/index.org @@ -11,7 +11,8 @@ #+HTML_HEAD: * General -- This is a subproject of [[https://www3.svjatoslav.eu/projects/sixth/][Sixth]] +- This is a subproject of [[https://www3.svjatoslav.eu/projects/sixth/][Sixth]] with the goal of providing data + storage and computation facilities. - This program is free software: you can redistribute it and/or modify it under the terms of the [[https://www.gnu.org/licenses/lgpl.html][GNU Lesser General Public License]] as @@ -33,56 +34,41 @@ - Clone Git repository using command: : git clone https://www2.svjatoslav.eu/git/sixth-data.git -- See [[https://www3.svjatoslav.eu/projects/sixth-data/apidocs/][JavaDoc]]. - * Vision / goal :PROPERTIES: :ID: f6764282-a6f6-44e6-8716-b428074dd093 :END: -Provide versioned, clustered, flexible, distributed, multi-dimensional -data storage engine for the [[http://www2.svjatoslav.eu/gitbrowse/sixth/doc/index.html][Sixth computation engine]]. - -+ Speaking of traditional relational database and object oriented - business applications: - - + I hate object-relational impedance mismatch. - - + I don't like to convert data between persistent database and - runtime objects for every transaction. How about creating united - database/computation engine instead to: - - + Eliminate constant moving and converting of data between 2 systems - and make computing happen close to where the data is stored. - - + Abstract away difference between RAM VS persistent storage. Let - the system decide at runtime which data to keep in what kind of - memory. +Provide hackable, versioned, optimized, distributed, geometrical, +arbitrary dimensional ([[id:96116550-a6a1-4700-bef7-865d0deee7ea][hypercube based]]) data storage and computation +engine ([[id:d2375acc-af14-4f18-8ad0-7949501178c5][as inspired by the brain]]) for general purpose visual computing +environment called [[http://www2.svjatoslav.eu/gitbrowse/sixth/doc/index.html][Sixth]]. +Because [[http://www.paulgraham.com/rootsoflisp.html][Lisp is hackable self defined programmable programming +language]] it would be used to provide [[https://en.wikipedia.org/wiki/Imperative_programming][imperative programming]] support. * Inspiration -+ Relational databases: - + Transactional. - + Indexable / Quickly searchable. - -+ Git (version control system) - + Versionable - + Branchable / mergeable. - + Transparent cansistency, checksumming and deduplication. - + (Git as a database: - https://www.kenneth-truyers.net/2016/10/13/git-nosql-database/ ) - +:PROPERTIES: +:ID: 0fa6354b-18c9-4120-bbf5-c7239aebecab +:END: ++ see also: [[https://en.wikipedia.org/wiki/OLAP_cube][OLAP cube]]. ** Brain :PROPERTIES: :ID: d2375acc-af14-4f18-8ad0-7949501178c5 :END: -+ Brain appears to have more than 3D dimensional design: ++ Brain appears to be natural geometrical/parallel data storage and + computational engine: + + https://www.quantamagazine.org/the-brain-maps-out-ideas-and-memories-like-spaces-20190114/ + ++ Even more awesome is that brain appears to operate and is wired as + arbitrary/variable dimensional structure: https://singularityhub.com/2017/06/21/is-there-a-multidimensional-mathematical-world-hidden-in-the-brains-computation/ -+ Brain appears to use geometry to map thoughts and even sounds: - + https://www.quantamagazine.org/the-brain-maps-out-ideas-and-memories-like-spaces-20190114/ ++ On top of this, this multidimensional space that brain represents + has dynamic/variable resolution/density: + https://www.quantamagazine.org/goals-and-rewards-redraw-the-brains-map-of-the-world-20190328 -+ It directly inspires [[id:171fe375-c737-41e6-b429-a414f6abc5d8][Geometrical computation]] idea and nicely fits - with [[id:01aa65c1-3d44-44a8-9b90-58454bc6be80][CM-1 Connection Machine]] design. ++ Such properties allow parallel [[id:171fe375-c737-41e6-b429-a414f6abc5d8][Geometrical computation]] and + beautifully fits [[id:01aa65c1-3d44-44a8-9b90-58454bc6be80][CM-1 Connection Machine]] architecture (for extra + hardware accelerated solution). ** CM-1 Connection Machine :PROPERTIES: @@ -90,40 +76,55 @@ data storage engine for the [[http://www2.svjatoslav.eu/gitbrowse/sixth/doc/inde :END: https://en.wikipedia.org/wiki/Connection_Machine -+ see: [[id:171fe375-c737-41e6-b429-a414f6abc5d8][Geometrical computation]] +Massively parallel (thousands of CPUs) connected via +machine's internal 12-dimensional hypercube network allows to +efficiently simulate arbitrary dimensional hypercube and network +topology between computational units. So that when we are +solving/simulating for example 5 dimensional problem, we can arrange +computational units into virtual 5D network. See: +http://www.mission-base.com/tamiko/theory/cm_txts/di-ch2.html + +we can pre-distribute data across computation units and perform +parallel [[id:171fe375-c737-41e6-b429-a414f6abc5d8][geometrical computation]]. + +* Reasons for hypercube as a so called first class citizen +:PROPERTIES: +:ID: 96116550-a6a1-4700-bef7-865d0deee7ea +:END: ++ Hypercube is quite general purpose data structure that naturally + encapsulates wide variety data and problems. -+ Computation unit has local CPU and RAM. ++ Nicely captures apparent [[id:d2375acc-af14-4f18-8ad0-7949501178c5][properties of the brain]]. -+ Data is pre-distributed across computation units. ++ Naturally supports distributed and parallel [[id:171fe375-c737-41e6-b429-a414f6abc5d8][geometrical data storage + and computation.]] -+ Machine's internal 12-dimensional hypercube network allows to - efficiently simulate arbitrary dimensional network topology between - computational units. So that when we are solving/simulating for - example 5 dimensional problem, we can arrange computational units - into virtual 5D network. See: - http://www.mission-base.com/tamiko/theory/cm_txts/di-ch2.html ++ Dedicated hardware like [[id:01aa65c1-3d44-44a8-9b90-58454bc6be80][CM-1]] can be built around hypercube concept + that results in data, computation process and hardware, all + beautifully fitting together while complementing each other + strengths. -* Ideas -** Geometrical computation ++ Hypercube stored data (and computation process) has geometry by its + nature and should fit nicely with "3D first" user interface ideology + of the parent [[http://www2.svjatoslav.eu/gitbrowse/sixth/doc/index.html][Sixth]] project. +* Geometrical computation idea :PROPERTIES: :ID: 171fe375-c737-41e6-b429-a414f6abc5d8 :END: -+ Inspired by [[id:d2375acc-af14-4f18-8ad0-7949501178c5][Brain]]. -+ Wits nicely with [[id:01aa65c1-3d44-44a8-9b90-58454bc6be80][CM-1 Connection Machine]] properties. - -*** Distributed computation and data storage +** Distributed computation and data storage :PROPERTIES: :ID: 5d287158-53ea-44a2-a754-dd862366066a :END: -Maybe every problem can be translated to geometry (use any shapes and -as many dimensions as you need). Solution(s) to such problems would -then appear as relatively simple search/comparison/lookup results. As -a bonus, such geometrical *data storage* AND *computation* can be -naturally made in *parallel* and *distributed*. That's what neurons in -the brain appear to be doing ! :) . Learning means building/updating -the model (the hard part). Question answering is making (relatively -simple) lookups (geometrical queries) against the model. -*** Mapping of hyperspace to traditional object-oriented model +Lots of problems can be translated to geometry (use any shapes and as +many dimensions as you need). Solution(s) to such problems could be +then found via geometrical search/comparison/lookup results. As a +bonus, such geometrical *data storage* AND *computation* can be +naturally made in *parallel* and *distributed*. + +Learning means building/updating/re-balancing the model (the hard +part). Question answering is making (relatively simple) lookups +(geometrical queries) against the model. +** Mapping hypercube to object-oriented model and relational database :PROPERTIES: :ID: a117c11e-97c1-4822-88b2-9fc10f96caec :END: @@ -131,71 +132,47 @@ Object oriented programming is inspired by the way human mind operates. It allows programmer to express ideas to computer in a more human-like terms. -It is possible to map object model to geometrical hyperspace: - -+ Object is a point in space (universe). Each object member variable - translates to its own dimension. That is: if class declares 4 - variables for an object, then corresponding object can be stored as - a single point inside 4 dimensional space. Variable values translate - to point coordinates in space. That is: Integer, floating point - number and even boolean and string can be translated to linear value - that can be used as a coordinate along particular dimension. - -+ Each class declares its own space (universe). All class instances - (objects) are points inside that particular universe. References - between objects of different types are hyperlinks (portals) between - different universes. -*** Handling of relations +It is actually also possible to map object model and relational +database to geometrical hyperspace: + ++ Object or database table row is a point in hypercube arbitrary + dimensional space. Each object member variable or database table + column can be mapped to its own dimension in hypercube. That is: if + class declares 4 variables for an object, then corresponding object + can be stored as a single point inside 4 dimensional + hypercube. Variable values translate to point coordinates in that + hypercube. That is: numbers and string can be translated to linear + value that can be used as a coordinate along particular dimension. + ++ Each object class or database table declares its own hypercube that + contain instances (objects) of that class or rows of a table. + +** Mapping entity relations in hypercube :PROPERTIES: :ID: b6b15bd2-c78b-4c51-a343-72843a515c29 :END: -Consider we want to create database of books and authors. Book can -have multiple authors, and single person can be author for multiple -books. It is possible to store how many hours of work each author has -contributed to every book, using hyperspace as follows: - -+ Every dimension corresponds to one particular book author. (10 - authors in the database, would require 10 dimensional space) - + Point in space corresponds to one particular book. - + Point location along particular (author) dimension corresponds - to amount of work contributed by particular author for given - book. - -Alternatively: - -+ Every dimension corresponds to one particular book. - + Point in space corresponds to one particular author in the entire - database. - + Point location along particular (book) dimension corresponds to - amount of work contributed for book by given author (point). - -** Layered architecture -+ layer 1 :: disk / block storage / partition - -+ layer 2 :: key/value storage. Keys are unique and are dictated by - storage engine. Value is arbitrary but limited size byte - array. This layer is responsible for handling disk - defragmentation and consistency in case of crash - recovery. - -+ layer 3 :: key/value storage. Keys are content hashes. Values are - arbitrary but limited size content byte arrays. This - layer effectively implements content addressable - storage. Content addressible storage enables GIT-like - behavior (possibility for competing branches, retaining - history, transparent deduplication) - -+ layer 4 :: Implements arbitrary dimensional multiverse. - -+ layer 5 :: Distributed computation engine. +Consider we want to create database of: ++ Books. ++ Authors. ++ Effort: Amount of time contributed by every author to every book + that he/she wrote. + +Information above can be represented as 3D cube where dimensions are: ++ X: Book ++ Y: Author ++ Z: Effort + +Points in that cube would nicely capture many to many relations +between authors and the books. * Current status - More or less defined [[id:f6764282-a6f6-44e6-8716-b428074dd093][Vision / goal]]. -- Collected some [[id:d2375acc-af14-4f18-8ad0-7949501178c5][ideas]]. +- Collected some [[id:0fa6354b-18c9-4120-bbf5-c7239aebecab][inspiring]] [[id:171fe375-c737-41e6-b429-a414f6abc5d8][ideas]]. - Implemented very simple persistent key-value map. - Long term goal is to use it as a backing storage engine and - implement more advanced features on top of this. + implement more advanced features on top of this via layered + architecture. * See also Interesting or competing projects with good ideas: diff --git a/tools/update web site b/tools/update web site index c2219a2..3981451 100755 --- a/tools/update web site +++ b/tools/update web site @@ -5,8 +5,8 @@ cd .. mvn clean package -rm -rf doc/apidocs/ -cp -r target/apidocs/ doc/ +# rm -rf doc/apidocs/ +# cp -r target/apidocs/ doc/ rsync -avz --delete -e 'ssh -p 10006' doc/ n0@www3.svjatoslav.eu:/mnt/big/projects/sixth-data/ -- 2.20.1