Sixth - system for data storage, computation, exploration and interaction

This is a subproject of Sixth -
download latest snapshot -
This program is free software; you can redistribute it and/or modify -it under the terms of version 3 of the GNU Lesser General Public -License or later as published by the Free Software Foundation. -
Program author: -
- Svjatoslav Agejenko -
- Homepage: http://svjatoslav.eu +
  +
  Sixth Data - Data storage and computing engine
  +
  +
  Table of Contents
  +
  +
  +
  1. General +
  +
  1.1. Source code
  +
  
  -
  Email: mailto://svjatoslav@svjatoslav.eu +
  2. Vision / goal
  +
  3. Inspiration +
  +
  3.1. Brain
  +
  3.2. CM-1 Connection Machine
  +
  
  +
  4. Reasons for hypercube as a so called first class citizen
  +
  5. Geometrical computation idea +
  +
  5.1. Distributed computation and data storage
  +
  5.2. Mapping hypercube to object-oriented model and relational database
  +
  5.3. Mapping entity relations in hypercube
  
  - -
  other applications hosted at svjatoslav.eu +
  6. Current status
  +
  7. See also +
  +
  7.1. Computation on multi dimensional data
  +
  7.2. Distributed, reliable, parallel computing systems
  +
  7.3. Rules based machine reasoning
  +
  
  +
  +
  - -
  -
  1 Vision / goal
  +
  +
  1 General
  
  -
  -Provide versioned, clustered, flexible, object-relational database -functionality for the Sixth computation engine. -
  -
  -
  I hate object-relational impedance mismatch. -
  +
  This program is free software: released under Creative Commons Zero +(CC0) license
  -
  I don't like to convert data between persistent database and runtime -objects for every transaction. How about creating united -database/computation engine instead to: +
  Program author:
  -
  Eliminate constant moving and converting of data between 2 systems. -
  -
  Abstract away difference between RAM VS persistent storage. Let -the system decide at runtime which data to keep in what kind of -memory. -
  -
  -
  +
  Svjatoslav Agejenko
  +
  Homepage: https://svjatoslav.eu
  +
  Email: mailto://svjatoslav@svjatoslav.eu
  +
- Other software projects hosted at svjatoslav.eu

1.1 Inspiration

1.1 Source code

Relational databases: -
- Transactional. -
- Indexable / Quickly searchable. -
- Download latest snapshot in TAR GZ format
- Browse Git repository online
- +Clone Git repository using command: +
  +
```
+git clone https://www2.svjatoslav.eu/git/sixth-data.git
+
```
- See JavaDoc.
-

+ +

2 Vision / goal

+Provide hackable, versioned, optimized, distributed, geometrical, +arbitrary dimensional (hypercube based) data storage and computation +engine (as inspired by the brain) for general purpose visual computing +environment called Sixth. +

Git (version control system) +

+Because Lisp is hackable self defined programmable programming +language it would be used to provide imperative programming support. +

3 Inspiration

Versionable -
Branchable / mergeable. -
Transparent cansistency, checksumming and deduplication. -
(Git as a database: -https://www.kenneth-truyers.net/2016/10/13/git-nosql-database/ ) -
see also: OLAP cube.

- +

3.1 Brain

Brain appears to be natural geometrical/parallel data storage and +computational engine: +
- https://www.quantamagazine.org/the-brain-maps-out-ideas-and-memories-like-spaces-20190114/
Even more awesome is that brain appears to operate and is wired as +arbitrary/variable dimensional structure: +https://singularityhub.com/2017/06/21/is-there-a-multidimensional-mathematical-world-hidden-in-the-brains-computation/
On top of this, this multidimensional space that brain represents +has dynamic/variable resolution/density: +
- https://www.quantamagazine.org/goals-and-rewards-redraw-the-brains-map-of-the-world-20190328
Such properties allow parallel Geometrical computation and +beautifully fits CM-1 Connection Machine architecture (for extra +hardware accelerated solution).

1.2 Solution (the big idea)

3.2 CM-1 Connection Machine

-I see 4D data structure. +https://en.wikipedia.org/wiki/Connection_Machine

- -

-Dimensions: +we can pre-distribute data across computation units and perform +parallel geometrical computation.

+ +

4 Reasons for hypercube as a so called first class citizen

List of all the objecs in the system (rows). -
List of all declared unique object fields (columns). -
List of all historical transactions/commits/versions (think of -sheets of paper). -
List of all concurrently running branches/threads. Branches can -appear and merge over time as needed. -
(Every cell is concrete field value within an object) -
Hypercube is quite general purpose data structure that naturally +encapsulates wide variety data and problems.
Nicely captures apparent properties of the brain.
Naturally supports distributed and parallel geometrical data storage +and computation.
Dedicated hardware like CM-1 can be built around hypercube concept +that results in data, computation process and hardware, all +beautifully fitting together while complementing each other +strengths.
Hypercube stored data (and computation process) has geometry by its +nature and should fit nicely with "3D first" user interface ideology +of the parent Sixth project.

5 Geometrical computation idea

5.1 Distributed computation and data storage

+Lots of problems can be translated to geometry (use any shapes and as +many dimensions as you need). Solution(s) to such problems could be +then found via geometrical search/comparison/lookup results. As a +bonus, such geometrical data storage AND computation can be +naturally made in parallel and distributed. +

+ +

+Learning means building/updating/re-balancing the model (the hard +part). Question answering is making (relatively simple) lookups +(geometrical queries) against the model. +

5.2 Mapping hypercube to object-oriented model and relational database

+Object oriented programming is inspired by the way human mind +operates. It allows programmer to express ideas to computer in a more +human-like terms. +

-Partitioning/clustering: +It is actually also possible to map object model and relational +database to geometrical hyperspace:

Why not to partition/(load balance) as required across networked -physical computers along arbitrary dimension(s) declared above ? -
Object or database table row is a point in hypercube arbitrary +dimensional space. Each object member variable or database table +column can be mapped to its own dimension in hypercube. That is: if +class declares 4 variables for an object, then corresponding object +can be stored as a single point inside 4 dimensional +hypercube. Variable values translate to point coordinates in that +hypercube. That is: numbers and string can be translated to linear +value that can be used as a coordinate along particular dimension.
Each object class or database table declares its own hypercube that +contain instances (objects) of that class or rows of a table.

5.3 Mapping entity relations in hypercube

-Indexing (for fast searching): +Consider we want to create database of:

Why not to index along arbitrary dimensions (as required) ? -
Books.
Authors.
Effort: Amount of time contributed by every author to every book +that he/she wrote.

-Further optimizations: +Information above can be represented as 3D cube where dimensions are:

In current early stage, trying to focus on minimum possible set of -features that would provide maximum possible set of power/benefit :) -
Once featres are locked. Anything can be optimised. Optimization for -size (deduplication) can be solved using Git style content -addressible storage mechanism. -
X: Book
Y: Author
Z: Effort

+ +

+Points in that cube would nicely capture many to many relations +between authors and the books. +

2 Current status

6 Current status

More or less defined Vision / goal.
Collected some inspiring ideas.
Implemented very simple persistent key-value map. -

Long term goal is to use it as a backing storage engine and +implement more advanced features on top of this via layered +architecture.

7 See also

-Long term goal is to implement more advanced features on top of this. +Interesting or competing projects with good ideas:

3 TODO

3.1 check out Magma

http://wiki.squeak.org/squeak/2665 -
flexible user interface building for interacting with different data +
- Glamorous Toolkit +
  - Moldable development environment. It is a live notebook. It is a +flexible search interface. It is a fancy code editor. It is a +software analysis platform. It is a data visualization engine. All +in one.

+ +

7.1 Computation on multi dimensional data

Array Databases: Concepts, Standards, Implementations
TileDB +
- Analyze and share complex multi-dimensional data at scale
CM-1 Connection Machine
Lisp-Stat: An environment for Statistical Computing

1. Vision / goal -
- 1.1. Inspiration
- 1.2. Solution (the big idea)
-
2. Current status
3. -
- 3.1. check out Magma

- +

Author: Svjatoslav Agejenko

Created: 2021-04-01 Thu 19:11

Validate