top of page

Designing for Common Comprehension: Pt2 - Melior via Discendi

  • blukashev
  • Oct 7
  • 8 min read
ree

In part 1 of this series we covered the fractures, contentions, and mismatched edges between the disparate elements of the AI ecosystem today. Here we begin to address how to make "lemonAIde" from said lemons.

A Paradigm Shift is Required

Consider a clean sheet approach to how we implement what we know about the algorithms and resources involved in conventional, ML, and quantum computation. Break everything down to its primitives and look at designing a computational system from the ground up as an idealized thought exercise... What if models were trained on a language which conformed to spec, at the intrinsic level of its execution, abstract syntax tree (AST), and human-facing semantics? What if that language was used to in-turn run those models leveraging target-specific back-ends to execute the right parts on the right hardware? What if it could continually train models and have them write the language in real-time on which they are trained? What if it ran across heterogeneous systems in a shared memory space allowing communication to occur by (public) function/method/attribute calls on asynchronous actor-like objects instantiated in the AST, interacting with data, and machine-code compiled-down in a manner those models understand with microsecond latency? What if it considered runtime and data security (integrity and proof of accurate execution with boundary enforcement as the mechanism of proof to include calls accessing data) as a first-order priority from the moment at which it was conceived? What if it could natively handle the quantum calculations of "maybe?" What if it wasn't just one language? Sound ambitious? We think so; which is exactly why we're going to build it for everybody (and the disembodied) to use.


How?

  1. In order to accurately "think" about problem domains, vector similarity engines need to be supplied with distinct dimensions and clear bounds for those domains to differentiate the subject matter and terminate evaluation of sequences which do not conform to the requisite specification - branch trimming of the logic tree. Elimination of general concepts, human language, and the complexity required to address those elements permits construction of expert micro-models and MoEs thereof to handle specific subject domains at a granular level which can easily be refined and to compose intermediate or control planes from the MoEs ensuring proper interface with the runtime at its current state of refinement. Specialization of the subject domains thus permits:

    1. End-to-end visibility through code transformation, dispatch, execution, and the returns/flow of a runtime which is trained into the observers in real-time allowing for self-improvement and refinement of the runtime in a way conventional JIT can't effect.

    2. Visibility of methods on objects/structs/traits providing accessor patterns far more secure and efficient than MCP, A2A, or whatever the fuzzy-IPC fad is by the time this text is read.

  2. Human beings similarly need to have clear comprehension of the system in order to innovate and improve upon it - languages they already know, understand, and in which libraries which they use have been written are pretty much pre-requisite in order to gain any traction.

    1. Multiple language front-ends must be handled consistently in this paradigm in order to gain adoption - the momentum of Python is difficult "to beat" but much easier to divert toward doing work more efficiently

    2. Momentum in many cases comes at the price of mass over velocity creating opportunity to enhance the language intrinsics and semantics while gaining adoption through full compatibility with "upstream."

  3. Execution dispatch as it's handled today even on systems with a CUDA, iGPU, and conventional CISC back-end on-tap is the provenance of inferencing runtimes such as vllm/vllm.rs/llamacpp/etc or the corresponding training frameworks (sometimes one and the same). A JIT-style code cache evaluating front-to-back can be much more informative as to the appropriate execution context available based on bounds and runtime heuristics than the decision mechanisms inferencing runtimes use today. Embue the code cache with a parallel vector DB for RAG/runtime learning and the guidance provided to what today is a JIT becomes far more encompassing than the local domain of optimization normally in its scope.

    1. That same code cache, enriched and vectorized, can be distributed easily to mask attention and prefill token caches for processing only the relevant parts of code being handled by the stack. In current terms, a sort of RAG or KV cache relevantly partitioned and accessible to authorized actors via native memory address space.

    2. Back-stepping within the cache to unwind optimizations step-wise with reasoning stored in co-cached vectors allows for considerate runtime modification without full recompilation of objects reducing the latency of their execution after their runtime refinement as well as moving execution from back-ends like x86 to SIMD as optimizations break the code down to relevant maths better executed that way.

    3. Delegation of workloads to systems handling varying levels of computation precision dynamically allows conventional binary systems to calculate when quantum computation is to be effected without humans being involved in the complex dispatch decision requiring prediction of whether conventional linalg/calculus solutions suffice or such offload is actually beneficial within the bounds of all resource constraints.

    4. Toolchains (compilers) do not have to necessarily output binary content - intermediate representations appropriate to execution targets are completely valid and reasoning about their utilization, optimization, and scheduling/execution can occur just ahead of time through dynamic dispatch of resource context for availability and operational need to complete the intent.

  4. RDMA has existed for decades but is generally used as a form of primitive IO plumbing largely due to the need to coordinate locking and conventional access semantics across a distributed plane - gating through blocking protocols such as IB is a non-starter for optimistic dispatch. Complex applications natively leveraging the shared memory space are few and far between but the mechanisms to enable truly distributed application execution have existed in various forms for some time to include modern asynchronous schedulers and datapath state aware networks.

    1. Localization of coordination by memory region (host, device, numa, topology, etc) with asynchronous dispatch in a proofed data-flow allows for application elements to execute at tempo with blocking (or usually some form of sleep). CSP, Actors, Futures, and DataFlow structures permit real-time construction of scheduling patterns at local tiers with resolution by higher level or with peer authorities/quorum at the top as the runtime evolves.

    2. Integration with advanced fabric such as Broadcom's DNX, Enfabrica's PCI dataplane, or even composable solutions opens the doors to novel resource constructs previously unavailable even in bespoke HPC.

  5. Formal code/flow validation techniques and real world implementation of runtime guards are far from theoretical - our colleagues over at Open Source Security have implemented such in the kernel down to private memory regions in ring0; to include prevention of direct and speculative access by ring0 itself, isolation of its kernel stacks, application of fine-grained forward and reverse CFI in deterministic and probabalistic defense models (which also happens to improve runtime performance in various cases due to layout optimization), and whitelisted gating of memory access between rings of privilege to include abnormal contexts such as VDSO and eBPF - just to name a few of the techniques they've implemented in production code. The industry can and should learn from them.

    1. Hardening of the runtime as a prime design goal permits leveraging the mechanics selected in the design phase as springboards for optimization in parallel or downstream elements of the stack - there is overhead in runtime and compile-time checking but there is also opportunity to drop branches of code uncompiled and execute only what can provably be allowed. JIT being aware of security boundaries only enhances its comprehension of viable optimization strategies.

    2. Data access separate from the executable contexts of stacks/warps can also be provably gated by ensuring that accessor intrinsics include permissions checks and a coherent framework for mandatory access control. At the language front-end layer this presents such as public and private classes, methods, and valid resource back-ends. In the AST and below however the engine itself can be used to measure relative access graphs when determining exposure of context - objects, classes, methods, linking targets and symbols, etc. Conventional application layer access controls then reside atop runtime controls/AAA to permit flow-through graphing of nodes and their edges in the evaluation of effective access rights prior to executing a fetch (to avoid speculative concerns).

    3. Intent and execution graphs bounding the training and code-generation mechanics ensure that as code itself evolves to unrecognizable states it is accurately market with and proofs for adherence to stated intent.

  6. Use of translation layers between experts, actors, systems, and communication with human beings or the relevant front-end language construct would normally be an immense burden to maintain but in a paradigm such as this, they are an intrinsic byproduct of the compilation/training pipeline similar to swagger docs generated from an API.

    1. Separation of concern between communication and operational logic is commonplace in conventional computing but difficult in ML due to lack of relevant datasets upon which to train and then independently validate the results of said training. Embedding all dimensions of the ecosystem and intent into the models at each iteration provides a fairly stable corpus with progressive iteration in which to close the domain knowledge loop.

  7. ... details of the above and other key elements of the core systems architecture to be covered in subsequent posts.


What, When, "With What Army," and Most Importantly - Why?

Much of the conventional compute foundation already exists in the ashes of the Rubinius project or is being assembled atop them; and the rest is actively being built over the coming months (at least toward an MVP to springboard adoption/communal development).


Semper Victus has retained the skills and services of Brian Shirai of the aforementioned Rubinius project at FTE to effect the Open Source implementation of this work. We have retained Silverkey's Elizabeth Wharton to help shepherd the organization around it, and we have called to arms the cabal of experts orbiting Semper Victus to an internal working group which began work some time ago. Venerable industry talent such as Norman Schibuk (helped bring us RISC), Tobias Ford (helped bring us SDN cloud), David Maynor (helped bring <redacted> and infosec awareness) are some of the talent advising direction and architecture in the foundational primitives from which we are building anew. People who've lived the mistakes of the past, innovated solutions for the present & future, and built a fair chunk of the digital world on which we rely are putting their expertise into this work. People armed with talent and tooling to overcome the original hurdles for the ahead-of-its-day VM and build a new modality of computing leveraging disparate classes of computational capacity natively and heterogeneously.


On the practical matter of funding: right now, it's just us - all told this will probably run us the better part of a million to get off the ground... a "tangible commitment" for an company our size but one we believe to be worth the investment. Sometimes you have to put your money where your mouth is, especially when you're talking about paradigm shift; so we're coming to the matter correct.


The intent of this effort is to establish a standalone entity serving the public and systems as a central repository of knowledge and function upon which the ecosystem will be able to better build itself in every sense of the statement. We are actively working to channelize the structure and mechanisms of the organization to prevent the sorts of problems we've all seen Open Source face in recent years by incentivizing a common core library to which everybody commits and which can be used by everyone for education, production, and profit as relevant to their needs.


If we are correct about what we believe to be the eventual outcome of this work then we are doing it to help bridge the technology gap between people and systems enabling both to evolve more rapidly and overcome the various challenges we face through massive acceleration in problem-solving capacity. If we are not, then the tombstone of this project will be another brick in the foundation of what is to come.


What's Next?

In the rest of this series will dive into the individual aspects of this design, their mechanisms and intent, and finally the sinews of logic binding them together in an asynchronous dance streamlining toward self-improvement. Our next post - Mixture of Experts, will start to cover the Intrinsic Language Model architecture and components as well as a bit of reinforcement for readers about the team and why this isn't a moonshot for people who've helped us actually get to the moon.


 
 
 

Comments


Single Post: Blog_Single_Post_Widget

©2023 by Semper Victus LLC, a veteran owned business.

bottom of page