Where is science heading in 2023?
While everyone is talking about ChatGPT and how it may take over our jobs, there are cutting edge technologies pushing the barriers in science. These technologies have the potential to take over the science in the next 10 years and they should if you ask me. This is, of course, not an exhaustive list. Sources are listed as hyperlinks.
Organoids
Careful engineering of cell cultures in the lab to imitate our organs as close as possible. How is this useful? Organoids will allow better study and experimentation of our cells. For example, how cells respond to pathogens and medications. How the organoids can be “engineered” is a question that is being answered by several methods (read more here.)
Molecular and Nanoscale Communication
Our cells constantly grow and die. Cell metabolism is the aggregate of chemical reactions that help them consume, secrete or create and breakdown chemical compounds. This happens with a chain of reactions- both within the cell walls and through the boundaries of the cell walls. Cell metabolism keeps the cells living or to replicate and die. Another field of synthetic biology, like Organoids, molecular communication “models” the cellular processes as bits and bytes i.e., information. The pathways through which chemical reactions occur are the channels that carry the information from one cell to another (or the outside). What are sender and receivers of this communication channel? It’s the cells themselves! The interest in molecular and nanoscale communication has declined in the last 2-3 years however; there are newer ways to control the cells being actively looked at (see organoids.)
How We Store and Process the Data- Customize Storage for Your Computations
This is of particular interest of mine. We mostly operate on row-based data formats. A simple example is a CSV file where each row is a separate entry, also called a “data point”. Lets says you wrote a program that searches for a particular query, say, “age of person with name X with their DOB below 1970.” The program reads through all rows in the “name” column until it finds all persons with name “X”, then it checks if the “DOB” column for those rows fall below 1970. This is a quick process.
Imagine that your data contains millions of rows. Your code will spend most of its runtime in reading each row to find the matches for “X” and then “DOB”. Instead of performing computations on the data, your code spends most of its time in the I/O phase, looking for the data it needs to perform those said computations. This does not scale well, does it? We can reduce the runtime of the search dramatically if we instead store the data by the columns. Why is that?
To understand why it takes longer to search through the rows, we need to get to the bottom- the way data is read from the disk using an I/O call. We can only read data from storage (i.e., the disk) in a continuous line. Take this example: suppose the English alphabets are stored in the disk as they are. “A B C D E…..Z”. If you were to find the alphabet “L”, you will have to start from “A” and continue all the way until you find “L”. Now apply the same logic to a row-based layout like a CSV. Now you see the problem.
What if we store the same data one column after another? This means that we store the column name first followed by all the values for it. After that, you store the second column name followed by all its values and so on. When the code searches on such a data layout, it goes through the data continuously in a single line. In technical terms, this is called a single seek. A seek is a single I/O call. I/O calls are how your code reads from the disk. Reducing the number of seeks and consequently the I/O calls dramatically speeds up the data lookup.
There are nuances that I did not cover in this write-up. Row-based layout may be faster for certain types and sizes of data. That is for another day and time. It is all about customizing the storage for your computation needs!
You may have to look forward to a series of hands-on articles on this subject.
Remote and Smart-city Testbeds
Intent-based Networks
Intent-based Networking (IBN) is like chatGPT but for network management, but lot more technical. You can find the the RFC and a a draft RFC online. Why is IBN important? For one, it will make network operation easier. Network operators, for example those who manage routers at the internet backbone (think of Cisco), can convey their “intent” to the software running on these routers and the routers implement the network operations like changing a routing path between two nodes. I use the term nodes losely here, it could refer to a router, a switch or a server sitting in some university, etc.
While I say IBN is like chatGPT for networks, it is just a cheek-in-tongue comparison. There is an overlap between both though: both use artificial intelligence and/or machine learning to get the job done. This is less in extent for IBN however. IBN can use AI/ML to interpret the network operator’s intent but it does not have to. use AI/ML to get the job done.
Why should you be interested in IBN? Cisco, Juniper networks, VMWare and, even academic networking implementations are actively investing heavily in IBN in hopes to push their networking into the future.
If you are interested to learn and play with IBN, you can use an open-source extension of the P4 programming language called P4IO (source code is here; paper is here.)
P4 is a large ecosystem for full network deployment. It uses software-defined networking (SDN) paradigm- which divides the network into two separate planes, namely, a control plane and a data plane. SDN and its details are for another day. You can refer to this 2009 article. Things have changed radically in SDN since the article but it gives you a high-level design of SDN. You may want to look at the simulation software written on top of P4: PFPSIM; P4lang (look for the mininet section in README.md for simulation-specific details); pfpgen, pfpsim and, pfpdb and; NS-4 (See the paper here.)
Quantum Networks
No, not quantum computing :)