Sixth Data - Data storage and computing engine
1 General
- This program is free software: released under Creative Commons Zero (CC0) license
- Program author:
- Svjatoslav Agejenko
- Homepage: https://svjatoslav.eu
- Email: mailto://svjatoslav@svjatoslav.eu
- Other software projects hosted at svjatoslav.eu
1.1 Source code
- Download latest snapshot in TAR GZ format
- Browse Git repository online
- Clone Git repository using command:
git clone https://www2.svjatoslav.eu/git/sixth-data.git
- See JavaDoc.
2 Vision / goal
Provide hackable, versioned, optimized, distributed, geometrical, arbitrary dimensional (hypercube based) data storage and computation engine (as inspired by the brain) for general purpose visual computing environment called Sixth.
Because Lisp is hackable self defined programmable programming language it would be used to provide imperative programming support.
3 Inspiration
- see also: OLAP cube.
3.1 Brain
- Brain appears to be natural geometrical/parallel data storage and computational engine:
- Even more awesome is that brain appears to operate and is wired as arbitrary/variable dimensional structure: https://singularityhub.com/2017/06/21/is-there-a-multidimensional-mathematical-world-hidden-in-the-brains-computation/
- On top of this, this multidimensional space that brain represents has dynamic/variable resolution/density:
- Such properties allow parallel Geometrical computation and beautifully fits CM-1 Connection Machine architecture (for extra hardware accelerated solution).
3.2 CM-1 Connection Machine
https://en.wikipedia.org/wiki/Connection_Machine
Massively parallel (thousands of CPUs) connected via machine's internal 12-dimensional hypercube network allows to efficiently simulate arbitrary dimensional hypercube and network topology between computational units. So that when we are solving/simulating for example 5 dimensional problem, we can arrange computational units into virtual 5D network. See: http://www.mission-base.com/tamiko/theory/cm_txts/di-ch2.html
we can pre-distribute data across computation units and perform parallel geometrical computation.
4 Reasons for hypercube as a so called first class citizen
- Hypercube is quite general purpose data structure that naturally encapsulates wide variety data and problems.
- Nicely captures apparent properties of the brain.
- Naturally supports distributed and parallel geometrical data storage and computation.
- Dedicated hardware like CM-1 can be built around hypercube concept that results in data, computation process and hardware, all beautifully fitting together while complementing each other strengths.
- Hypercube stored data (and computation process) has geometry by its nature and should fit nicely with "3D first" user interface ideology of the parent Sixth project.
5 Geometrical computation idea
5.1 Distributed computation and data storage
Lots of problems can be translated to geometry (use any shapes and as many dimensions as you need). Solution(s) to such problems could be then found via geometrical search/comparison/lookup results. As a bonus, such geometrical *data storage* AND *computation* can be naturally made in *parallel* and *distributed*.
Learning means building/updating/re-balancing the model (the hard part). Question answering is making (relatively simple) lookups (geometrical queries) against the model.
5.2 Mapping hypercube to object-oriented model and relational database
Object oriented programming is inspired by the way human mind operates. It allows programmer to express ideas to computer in a more human-like terms.
It is actually also possible to map object model and relational database to geometrical hyperspace:
- Object or database table row is a point in hypercube arbitrary dimensional space. Each object member variable or database table column can be mapped to its own dimension in hypercube. That is: if class declares 4 variables for an object, then corresponding object can be stored as a single point inside 4 dimensional hypercube. Variable values translate to point coordinates in that hypercube. That is: numbers and string can be translated to linear value that can be used as a coordinate along particular dimension.
- Each object class or database table declares its own hypercube that contain instances (objects) of that class or rows of a table.
5.3 Mapping entity relations in hypercube
Consider we want to create database of:
- Books.
- Authors.
- Effort: Amount of time contributed by every author to every book that he/she wrote.
Information above can be represented as 3D cube where dimensions are:
- X: Book
- Y: Author
- Z: Effort
Points in that cube would nicely capture many to many relations between authors and the books.
6 Current status
- More or less defined Vision / goal.
- Collected some inspiring ideas.
- Implemented very simple persistent key-value map.
- Long term goal is to use it as a backing storage engine and implement more advanced features on top of this via layered architecture.
7 See also
Interesting or competing projects with good ideas:
- Analyze and share complex multi-dimensional data at scale https://tiledb.com/
- ChrysaLisp
- Assembler/C-Script/Lisp 64 bit, MIMD, multi CPU, multi threaded, multi core, multi user Parallel OS. With GUI, Terminal, OO Assembler, Class libraries, C-Script compiler, Lisp interpreter, Debugger, and more…
- CM-1 Connection Machine
- Gemstone/S
- Completely distributed smalltalk based computing system.
- GRAKN.AI
- database in the form of a knowledge graph that uses machine reasoning to simplify data processing challenges for AI applications. https://grakn.ai/
- http://phantomos.org/
- Programs run forever. System crash or reboot does not destroy state of running program.
- Magma
- Multi-user object database for Squeak
- Taichi: A Language for High-Performance Computation onSpatially Sparse Data Structures
- TAOS
- Completely distributed operating system/virtual machine: