Tag: concurrency

Implementing Async Features in Python – A Step-by-step Guide

Asynchronous programming is a characteristic of modern programming languages that allows an application to perform various operations without waiting for any of them. Asynchronicity is one of the big reasons for the popularity of Node.js.

We have discussed Python’s asynchronous features as part of our previous post: an introduction to asynchronous programming in Python. This blog is a natural progression on the same topic. We are going to discuss async features in Python in detail and look at some hands-on examples.

Consider a traditional web scraping application that needs to open thousands of network connections. We could open one network connection, fetch the result, and then move to the next ones iteratively. This approach increases the latency of the program. It spends a lot of time opening a connection and waiting for others to finish their bit of work.

On the other hand, async provides you a method of opening thousands of connections at once and swapping among each connection as they finish and return their results. Basically, it sends the request to a connection and moves to the next one instead of waiting for the previous one’s response. It continues like this until all the connections have returned the outputs.

Source: phpmind

From the above chart, we can see that using synchronous programming on four tasks took 45 seconds to complete, while in asynchronous programming, those four tasks took only 20 seconds.

Where Does Asynchronous Programming Fit in the Real-world?

Asynchronous programming is best suited for popular scenarios such as:

1. The program takes too much time to execute.

2. The reason for the delay is waiting for input or output operations, not computation.

3. For the tasks that have multiple input or output operations to be executed at once.

And application-wise, these are the example use cases:

Web Scraping
Network Services

Difference Between Parallelism, Concurrency, Threading, and Async IO

Because we discussed this comparison in detail in our previous post, we will just quickly go through the concept as it will help us with our hands-on example later.

Parallelism involves performing multiple operations at a time. Multiprocessing is an example of it. It is well suited for CPU bound tasks.

Concurrency is slightly broader than Parallelism. It involves multiple tasks running in an overlapping manner.

Threading – a thread is a separate flow of execution. One process can contain multiple threads and each thread runs independently. It is ideal for IO bound tasks.

Async IO is a single-threaded, single-process design that uses cooperative multitasking. In simple words, async IO gives a feeling of concurrency despite using a single thread in a single process.

Fig:- A comparison in concurrency and parallelism

Components of Async IO Programming

Let’s explore the various components of Async IO in depth. We will also look at an example code to help us understand the implementation.

1. Coroutines

Coroutines are mainly generalization forms of subroutines. They are generally used for cooperative tasks and behave like Python generators.

An async function uses the await keyword to denote a coroutine. When using the await keyword, coroutines release the flow of control back to the event loop.

To run a coroutine, we need to schedule it on the event loop. After scheduling, coroutines are wrapped in Tasks as a Future object.

Example:

In the below snippet, we called async_func from the main function. We have to add the await keyword while calling the sync function. As you can see, async_func will do nothing unless the await keyword implementation accompanies it.

import asyncio
async def async_func():
    print('Velotio ...')
    await asyncio.sleep(1)
    print('... Technologies!')

async def main():
    async_func()#this will do nothing because coroutine object is created but not awaited
    await async_func()

asyncio.run(main())

import asyncio
async def async_func():
    print('Velotio ...')
    await asyncio.sleep(1)
    print('... Technologies!')

async def main():
    async_func()#this will do nothing because coroutine object is created but not awaited
    await async_func()

asyncio.run(main())

Output

RuntimeWarning: coroutine 'async_func' was never awaited
 async_func()#this will do nothing because coroutine object is created but not awaited
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Velotio ...
... Blog!

RuntimeWarning: coroutine 'async_func' was never awaited
 async_func()#this will do nothing because coroutine object is created but not awaited
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Velotio ...
... Blog!

2. Tasks

Tasks are used to schedule coroutines concurrently.

When submitting a coroutine to an event loop for processing, you can get a Task object, which provides a way to control the coroutine’s behavior from outside the event loop.

Example:

In the snippet below, we are creating a task using create_task (an inbuilt function of asyncio library), and then we are running it.

import asyncio
async def async_func():
    print('Velotio ...')
    await asyncio.sleep(1)
    print('... Blog!')

async def main():
    task = asyncio.create_task (async_func())
    await task
asyncio.run(main())

import asyncio
async def async_func():
    print('Velotio ...')
    await asyncio.sleep(1)
    print('... Blog!')

async def main():
    task = asyncio.create_task (async_func())
    await task
asyncio.run(main())

Output

Velotio ...
... Blog!

Velotio ...
... Blog!

3 Event Loops

This mechanism runs coroutines until they complete. You can imagine it as while(True) loop that monitors coroutine, taking feedback on what’s idle, and looking around for things that can be executed in the meantime.

It can wake up an idle coroutine when whatever that coroutine is waiting on becomes available.

Only one event loop can run at a time in Python.

Example:

In the snippet below, we are creating three tasks and then appending them in a list and executing all tasks asynchronously using get_event_loop, create_task and the await function of the asyncio library.

import asyncio
async def async_func(task_no):
    print(f'{task_no} :Velotio ...')
    await asyncio.sleep(1)
    print(f'{task_no}... Blog!')

async def main():
    taskA = loop.create_task (async_func('taskA'))
    taskB = loop.create_task(async_func('taskB'))
    taskC = loop.create_task(async_func('taskC'))
    await asyncio.wait([taskA,taskB,taskC])

if __name__ == "__main__":
    try:
        loop = asyncio.get_event_loop()
        loop.run_until_complete(main())
    except :
        pass

import asyncio
async def async_func(task_no):
    print(f'{task_no} :Velotio ...')
    await asyncio.sleep(1)
    print(f'{task_no}... Blog!')

async def main():
    taskA = loop.create_task (async_func('taskA'))
    taskB = loop.create_task(async_func('taskB'))
    taskC = loop.create_task(async_func('taskC'))
    await asyncio.wait([taskA,taskB,taskC])

if __name__ == "__main__":
    try:
        loop = asyncio.get_event_loop()
        loop.run_until_complete(main())
    except :
        pass

Output

taskA :Velotio ...
taskB :Velotio ...
taskC :Velotio ...
taskA... Blog!
taskB... Blog!
taskC... Blog!

taskA :Velotio ...
taskB :Velotio ...
taskC :Velotio ...
taskA... Blog!
taskB... Blog!
taskC... Blog!

Future

A future is a special, low-level available object that represents an eventual result of an asynchronous operation.

When a Future object is awaited, the co-routine will wait until the Future is resolved in some other place.

We will look into the sample code for Future objects in the next section.

A Comparison Between Multithreading and Async IO

Before we get to Async IO, let’s use multithreading as a benchmark and then compare them to see which is more efficient.

For this benchmark, we will be fetching data from a sample URL (the Velotio Career webpage) with different frequencies, like once, ten times, 50 times, 100 times, 500 times, respectively.

We will then compare the time taken by both of these approaches to fetch the required data.

Implementation

Code of Multithreading:

import requests
import time
from concurrent.futures import ProcessPoolExecutor


def fetch_url_data(pg_url):
    try:
        resp = requests.get(pg_url)
    except Exception as e:
        print(f"Error occured during fetch data from url{pg_url}")
    else:
        return resp.content
        

def get_all_url_data(url_list):
    with ProcessPoolExecutor() as executor:
        resp = executor.map(fetch_url_data, url_list)
    return resp
    

if __name__=='__main__':
    url = "https://www.velotio.com/careers"
    for ntimes in [1,10,50,100,500]:
        start_time = time.time()
        responses = get_all_url_data([url] * ntimes)
        print(f'Fetch total {ntimes} urls and process takes {time.time() - start_time} seconds')

import requests
import time
from concurrent.futures import ProcessPoolExecutor


def fetch_url_data(pg_url):
    try:
        resp = requests.get(pg_url)
    except Exception as e:
        print(f"Error occured during fetch data from url{pg_url}")
    else:
        return resp.content
        

def get_all_url_data(url_list):
    with ProcessPoolExecutor() as executor:
        resp = executor.map(fetch_url_data, url_list)
    return resp
    

if __name__=='__main__':
    url = "https://www.velotio.com/careers"
    for ntimes in [1,10,50,100,500]:
        start_time = time.time()
        responses = get_all_url_data([url] * ntimes)
        print(f'Fetch total {ntimes} urls and process takes {time.time() - start_time} seconds')

Output

Fetch total 1 urls and process takes 1.8822264671325684 seconds
Fetch total 10 urls and process takes 2.3358211517333984 seconds
Fetch total 50 urls and process takes 8.05638575553894 seconds
Fetch total 100 urls and process takes 14.43302869796753 seconds
Fetch total 500 urls and process takes 65.25404500961304 seconds

Fetch total 1 urls and process takes 1.8822264671325684 seconds
Fetch total 10 urls and process takes 2.3358211517333984 seconds
Fetch total 50 urls and process takes 8.05638575553894 seconds
Fetch total 100 urls and process takes 14.43302869796753 seconds
Fetch total 500 urls and process takes 65.25404500961304 seconds

ProcessPoolExecutor is a Python package that implements the Executor interface. The fetch_url_data is a function to fetch the data from the given URL using the requests python package, and the get_all_url_data function is used to map the fetch_url_data function to the lists of URLs.

Async IO Programming Example:

import asyncio
import time
from aiohttp import ClientSession, ClientResponseError


async def fetch_url_data(session, url):
    try:
        async with session.get(url, timeout=60) as response:
            resp = await response.read()
    except Exception as e:
        print(e)
    else:
        return resp
    return


async def fetch_async(loop, r):
    url = "https://www.velotio.com/careers"
    tasks = []
    async with ClientSession() as session:
        for i in range(r):
            task = asyncio.ensure_future(fetch_url_data(session, url))
            tasks.append(task)
        responses = await asyncio.gather(*tasks)
    return responses


if __name__ == '__main__':
    for ntimes in [1, 10, 50, 100, 500]:
        start_time = time.time()
        loop = asyncio.get_event_loop()
        future = asyncio.ensure_future(fetch_async(loop, ntimes))
        loop.run_until_complete(future) #will run until it finish or get any error
        responses = future.result()
        print(f'Fetch total {ntimes} urls and process takes {time.time() - start_time} seconds')

import asyncio
import time
from aiohttp import ClientSession, ClientResponseError


async def fetch_url_data(session, url):
    try:
        async with session.get(url, timeout=60) as response:
            resp = await response.read()
    except Exception as e:
        print(e)
    else:
        return resp
    return


async def fetch_async(loop, r):
    url = "https://www.velotio.com/careers"
    tasks = []
    async with ClientSession() as session:
        for i in range(r):
            task = asyncio.ensure_future(fetch_url_data(session, url))
            tasks.append(task)
        responses = await asyncio.gather(*tasks)
    return responses


if __name__ == '__main__':
    for ntimes in [1, 10, 50, 100, 500]:
        start_time = time.time()
        loop = asyncio.get_event_loop()
        future = asyncio.ensure_future(fetch_async(loop, ntimes))
        loop.run_until_complete(future) #will run until it finish or get any error
        responses = future.result()
        print(f'Fetch total {ntimes} urls and process takes {time.time() - start_time} seconds')

Output

Fetch total 1 urls and process takes 1.3974951362609863 seconds
Fetch total 10 urls and process takes 1.4191942596435547 seconds
Fetch total 50 urls and process takes 2.6497368812561035 seconds
Fetch total 100 urls and process takes 4.391665458679199 seconds
Fetch total 500 urls and process takes 4.960426330566406 seconds

Fetch total 1 urls and process takes 1.3974951362609863 seconds
Fetch total 10 urls and process takes 1.4191942596435547 seconds
Fetch total 50 urls and process takes 2.6497368812561035 seconds
Fetch total 100 urls and process takes 4.391665458679199 seconds
Fetch total 500 urls and process takes 4.960426330566406 seconds

We need to use the get_event_loop function to create and add the tasks. For running more than one URL, we have to use ensure_future and gather function.

The fetch_async function is used to add the task in the event_loop object and the fetch_url_data function is used to read the data from the URL using the session package. The future_result method returns the response of all the tasks.

Results:

As you can see from the plot, async programming is much more efficient than multi-threading for the program above.

The graph of the multithreading program looks linear, while the asyncio program graph is similar to logarithmic.

Conclusion

As we saw in our experiment above, Async IO showed better performance with the efficient use of concurrency than multi-threading.

Async IO can be beneficial in applications that can exploit concurrency. Though, based on what kind of applications we are dealing with, it is very pragmatic to choose Async IO over other implementations.

We hope this article helped further your understanding of the async feature in Python and gave you some quick hands-on experience using the code examples shared above.

December 12, 2022

Getting Started With Golang Channels! Here’s Everything You Need to Know
We live in a world where speed is important. With cutting-edge technology coming into the telecommunications and software industry, we expect to get things done quickly. We want to develop applications that are fast, can process high volumes of data and requests, and keep the end-user happy.

This is great, but of course, it’s easier said than done. That’s why concurrency and parallelism are important in application development. We must process data as fast as possible. Every programming language has its own way of dealing with this, and we will see how Golang does it.

Now, many of us choose Golang because of its concurrency, and the inclusion of goroutines and channels has massively impacted the concurrency.

This blog will cover channels and how they work internally, as well as their key components. To benefit the most from this content, it will help to know a little about goroutines and channels as this blog gets into the internals of channels. If you don’t know anything, then don’t worry, we’ll be starting off with an introduction to channels, and then we’ll see how they operate.

What are channels?

Normally, when we talk about channels, we think of the ones in applications like RabbitMQ, Redis, AWS SQS, and so on. Anyone with no or only a small amount of Golang knowledge would think like this. But Channels in Golang are different from a work queue system. In the work queue system like above, there are TCP connections to the channels, but in Go, the channel is a data structure or even a design pattern, which we’ll explain later. So, what are the channels in Golang exactly?

Channels are the medium through which goroutines can communicate with each other. In simple terms, a channel is a pipe that allows a goroutine to either put or read the data.

What are goroutines?

So, a channel is a communication medium for goroutines. Now, let’s give a quick overview of what goroutines are. If you know this already, feel free to skip this section.

Technically, a goroutine is a function that executes independently in a concurrent fashion. In simple terms, it’s a lightweight thread that’s managed by go runtime.

You can create a goroutine by using a Go keyword before a function call.

Let’s say there’s a function called PrintHello, like this:
```
func PrintHello() {
   fmt.Println("Hello")
}
```
You can make this into a goroutine simply by calling this function, as below:
```
//create goroutine
 go PrintHello()
```
Now, let’s head back to channels, as that’s the important topic of this blog.

How to define a channel?

Let’s see a syntax that will declare a channel. We can do so by using the chan keyword provided by Go.

You must specify the data type as the channel can handle data of the same data type.
```
//create channel
 var c chan int
```
Very simple! But this is not useful since it would create a Nil channel. Let’s print it and see.
```
fmt.Println(c)
fmt.Printf("Type of channel: %T", c)
<nil>
Type of channel: chan int
```
As you can see, we have just declared the channel, but we can’t transport data through it. So, to create a useful channel, we must use the make function.
```
//create channel
c := make(chan int)
fmt.Printf("Type of `c`: %T\n", c)
fmt.Printf("Value of `c` is %v\n", c)
 
Type of `c`: chan int
Value of `c` is 0xc000022120
```
As you may notice here, the value of c is a memory address. Keep in mind that channels are nothing but pointers. That’s why we can pass them to goroutines, and we can easily put the data or read the data. Now, let’s quickly see how to read and write the data to a channel.

Read and write operations on a channel:

Go provides an easy way to read and write data to a channel by using the left arrow.
```
c <- 10
```
This is a simple syntax to put the value in our created channel. The same syntax is used to define the “send” only type of channels.

And to get/read the data from channel, we do this:
```
<-c
```
This is also the way to define the “receive” only type of channels.

Let’s see a simple program to use the channels.
```
func printChannelData(c chan int) {
   fmt.Println("Data in channel is: ", <-c)
}
```
This simple function just prints whatever data is in the channel. Now, let’s see the main function that will push the data into the channel.
```
func main() {
   fmt.Println("Main started...")
   //create channel of int
   c := make(chan int)
   // call to goroutine
   go printChannelData(c)
   // put the data in channel
   c <- 10
   fmt.Println("Main ended...")
}
```
This yields to the output:
```
Main started...
Data in channel is:  10
Main ended...
```
Let’s talk about the execution of the program.

1. We declared a printChannelData function, which accepts a channel c of data type integer. In this function, we are just reading data from channel c and printing it.

2. Now, this method will first print “main started…” to the console.

3. Then, we have created the channel c of data type integer using the make keyword.

4. We now pass the channel to the function printChannelData, and as we saw earlier, it’s a goroutine.

5. At this point, there are two goroutines. One is the main goroutine, and the other is what we have declared.

6. Now, we are putting 10 as data in the channel, and at this point, our main goroutine is blocked and waiting for some other goroutine to read the data. The reader, in this case, is the printChannelData goroutine, which was previously blocked because there was no data in the channel. Now that we’ve pushed the data onto the channel, the Go scheduler (more on this later in the blog) now schedules printChannelData goroutine, and it will read and print the value from the channel.

7. After that, the main goroutine again activates and prints “main ended…” and the program stops.

So, what’s happening here? Basically, blocking and unblocking operations are done over goroutines by the Go scheduler. Unless there’s data in a channel you can’t read from it, which is why our printChannelData goroutine was blocked in the first place, the written data has to be read first to resume further operations. This happened in case of our main goroutine.

With this, let’s see how channels operate internally.

Internals of channels:

Until now, we have seen how to define a goroutine, how to declare a channel, and how to read and write data through a channel with a very simple example. Now, let’s look at how Go handles this blocking and unblocking nature internally. But before that, let’s quickly see the types of channels.

Types of channels:

There are two basic types of channels: buffered channels and unbuffered channels. The above example illustrates the behaviour of unbuffered channels. Let’s quickly see the definition of these:
- Unbuffered channel: This is what we have seen above. A channel that can hold a single piece of data, which has to be consumed before pushing other data. That’s why our main goroutine got blocked when we added data into the channel.
- Buffered channel: In a buffered channel, we specify the data capacity of a channel. The syntax is very simple. c := make(chan int,10) the second argument in the make function is the capacity of a channel. So, we can put up to ten elements in a channel. When the capacity is full, then that channel would get blocked so that the receiver goroutine can start consuming it.
Properties of a channel:

A channel does lot of things internally, and it holds some of the properties below:
- Channels are goroutine-safe.
- Channels can store and pass values between goroutines.
- Channels provide FIFO semantics.
- Channels cause goroutines to block and unblock, which we just learned about.
As we see the internals of a channel, you’ll learn about the first three properties.

Channel Structure:

As we learned in the definition, a channel is data structure. Now, looking at the properties above, we want a mechanism that handles goroutines in a synchronized manner and with a FIFO semantics. This can be solved using a queue with a lock. So, the channel internally behaves in that fashion. It has a circular queue, a lock, and some other fields.

When we do this c := make(chan int,10) Go creates a channel using hchan struct, which has the following fields:
type hchan struct { qcount uint // total data in the queue dataqsiz uint // size of the circular queue buf unsafe.Pointer // points to an array of dataqsiz elements elemsize uint16 closed uint32 elemtype *_type // element type sendx uint // send index recvx uint // receive index recvq waitq // list of recv waiters sendq waitq // list of send waiters // lock protects all fields in hchan, as well as several // fields in sudogs blocked on this channel. // // Do not change another G's status while holding this lock // (in particular, do not ready a G), as this can deadlock // with stack shrinking. lock mutex }
```
type hchan struct {
   qcount   uint           // total data in the queue
   dataqsiz uint           // size of the circular queue
   buf      unsafe.Pointer // points to an array of dataqsiz elements
   elemsize uint16
   closed   uint32
   elemtype *_type // element type
   sendx    uint   // send index
   recvx    uint   // receive index
   recvq    waitq  // list of recv waiters
   sendq    waitq  // list of send waiters
 
   // lock protects all fields in hchan, as well as several
   // fields in sudogs blocked on this channel.
   //
   // Do not change another G's status while holding this lock
   // (in particular, do not ready a G), as this can deadlock
   // with stack shrinking.
   lock mutex
}
```
(Above info taken from Golang.org]

This is what a channel is internally. Let’s see one-by-one what these fields are.

qcount holds the count of items/data in the queue.

dataqsize is the size of a circular queue. This is used in case of buffered channels and is the second parameter used in the make function.

elemsize is the size of a channel with respect to a single element.

buf is the actual circular queue where the data is stored when we use buffered channels.

closed indicates whether the channel is closed. The syntax to close the channel is close(<channel_name>). The default value of this field is 0, which is set when the channel gets created, and it’s set to 1 when the channel is closed.

sendx and recvx indicates the current index of a buffer or circular queue. As we add the data into the buffered channel, sendx increases, and as we start receiving, recvx increases.

recvq and sendq are the waiting queue for the blocked goroutines that are trying to either read data from or write data to the channel.

lock is basically a mutex to lock the channel for each read or write operation as we don’t want goroutines to go into deadlock state.

These are the important fields of a hchan struct, which comes into the picture when we create a channel. This hchan struct basically resides on a heap and the make function gives us a pointer to that location. There’s another struct known as sudog, which also comes into the picture, but we’ll learn more about that later. Now, let’s see what happens when we write and read the data.

Read and write operations on a channel:

We are considering buffered channels in this. When one goroutine, let’s say G1, wants to write the data onto a channel, it does following:
- Acquire the lock: As we saw before, if we want to modify the channel, or hchan struct, we must acquire a lock. So, G1 in this case, will acquire a lock before writing the data.
- Perform enqueue operation: We now know that buf is actually a circular queue that holds the data. But before enqueuing the data, goroutine does a memory copy operation on the data and puts the copy into the buffer slot. We will see an example of this.
- Release the lock: After performing an enqueue operation, it just releases the lock and goes on performing further executions.
When goroutine, let’s say G2, reads the above data, it performs the same operation, except instead of enqueue, it performs dequeue while also performing the memory copy operation. This states that in channels there’s no shared memory, so the goroutines only share the hchan struct, which is protected by mutex. Others are just copies of memory.

This satisfies the famous Golang quote: “Do not communicate by sharing memory instead share memory by communicating.”

Now, let’s look at a small example of this memory copy operation.
func printData(c chan *int) { time.Sleep(time.Second * 3) data := <-c fmt.Println("Data in channel is: ", *data) } func main() { fmt.Println("Main started...") var a = 10 b := &a //create channel c := make(chan *int) go printData(c) fmt.Println("Value of b before putting into channel", *b) c <- b a = 20 fmt.Println("Updated value of a:", a) fmt.Println("Updated value of b:", *b) time.Sleep(time.Second * 2) fmt.Println("Main ended...") }
```
func printData(c chan *int) {
   time.Sleep(time.Second * 3)
   data := <-c
   fmt.Println("Data in channel is: ", *data)
}
 
func main() {
   fmt.Println("Main started...")
   var a = 10
   b := &a
   //create channel
   c := make(chan *int)
   go printData(c)
   fmt.Println("Value of b before putting into channel", *b)
   c <- b
   a = 20
   fmt.Println("Updated value of a:", a)
   fmt.Println("Updated value of b:", *b)
   time.Sleep(time.Second * 2)
   fmt.Println("Main ended...")
}
```
And the output of this is:
```
Main started...
Value of b before putting into channel 10
Updated value of a: 20
Updated value of b: 20
Data in channel is:  10
Main ended...
```
So, as you can see, we have added the value of variable a into the channel, and we modify that value before the channel can access it. However, the value in the channel stays the same, i.e., 10. Because here, the main goroutine has performed a memory copy operation before putting the value onto the channel. So, even if you change the value later, the value in the channel does not change.

Write in case of buffer overflow:

We’ve seen that the Go routine can add data up to the buffer capacity, but what happens when the buffer capacity is reached? When the buffer has no more space and a goroutine, let’s say G1, wants to write the data, the go scheduler blocks/pauses G1, which will wait until a receive happens from another goroutine, say G2. Now, since we are talking about buffer channels, when G2 consumes all the data, the Go scheduler makes G1 active again and G2 pauses. Remember this scenario, as we’ll use G1 and G2 frequently here onwards.

We know that goroutine works in a pause and resume fashion, but who controls it? As you might have guessed, the Go scheduler does the magic here. There are few things that the Go scheduler does and those are very important considering the goroutines and channels.

Go Runtime Scheduler

You may already know this, but goroutines are user-space threads. Now, the OS can schedule and manage threads, but it’s overhead to the OS, considering the properties that threads carry.

That’s why the Go scheduler handles the goroutines, and it basically multiplexes the goroutines on the OS threads. Let’s see how.

There are scheduling models, like 1:1, N:1, etc., but the Go scheduler uses the M:N scheduling model.

Basically, this means that there are a number of goroutines and OS threads, and the scheduler basically schedules the M goroutines on N OS threads. For example:

OS Thread 1:

OS Thread 2:

As you can see, there are two OS threads, and the scheduler is running six goroutines by swapping them as needed. The Go scheduler has three structures as below:
- M: M represents the OS thread, which is entirely managed by the OS, and it’s similar to POSIX thread. M stands for machine.
- G: G represents the goroutine. Now, a goroutine is a resizable stack that also includes information about scheduling, any channel it’s blocked on, etc.
- P: P is a context for scheduling. This is like a single thread that runs the Go code to multiplex M goroutines to N OS threads. This is important part, and that’s why P stands for processor.
Diagrammatically, we can represent the scheduler as:

(This diagram is referenced from The Go scheduler]

The P processor basically holds the queue of runnable goroutines—or simply run queues.

So, anytime the goroutine (G) wants to run it on a OS thread (M), that OS thread first gets hold of P i.e., the context. Now, this behaviour occurs when a goroutine needs to be paused and some other goroutines must run. One such case is a buffered channel. When the buffer is full, we pause the sender goroutine and activate the receiver goroutine.

Imagine the above scenario: G1 is a sender that tries to send a full buffered channel, and G2 is a receiver goroutine. Now, when G1 wants to send a full channel, it calls into the runtime Go scheduler and signals it as gopark. So, now scheduler, or M, changes the state of G1 from running to waiting, and it will schedule another goroutine from the run queue, say G2.

This transition diagram might help you better understand:

As you can see, after the gopark call, G1 is in a waiting state and G2 is running. We haven’t paused the OS thread (M); instead, we’ve blocked the goroutine and scheduled another one. So, we are using maximum throughput of an OS thread. The context switching of goroutine is handled by the scheduler (P), and because of this, it adds complexity to the scheduler.

This is great. But how do we resume G1 now because it still wants to add the data/task on a channel, right? So, before G1 sends the gopark signal, it actually sets a state of itself on a hchan struct, i.e., our channel in the sendq field. Remember the sendq and recvq fields? They’re waiting senders and receivers.

Now, G1 stores the state of itself as a sudog struct. A sudog is simply a goroutine that is waiting on an element. The sudog struct has these elements:
```
type sudog struct{
   g *g
   isSelect bool
   next *sudog
   prev *sudog
   elem unsafe.Pointer //data element
   ...
}
```
g is a waiting goroutine, next and prev are the pointers to sudog/goroutine respectively if there’s any next or previous goroutine present, and elem is the actual element it’s waiting on.

So, considering our example, G1 is basically waiting to write the data so it will create a state of itself, which we’ll call sudog as below:

Cool. Now we know, before going into the waiting state, what operations G1 performs. Currently, G2 is in a running state, and it will start consuming the channel data.

As soon as it receives the first data/task, it will check the waiting goroutine in the sendq attribute of an hchan struct, and it will find that G1 is waiting to push data or a task. Now, here is the interesting thing: G2 will copy that data/task to the buffer, and it will call the scheduler, and the scheduler will put G1 from the waiting state to runnable, and it will add G1 to the run queue and return to G2. This call from G2 is known as goready, and it will happen for G1. Impressive, right? Golang behaves like this because when G1 runs, it doesn’t want to hold onto a lock and push the data/task. That extra overhead is handled by G2. That’s why the sudog has the data/task and the details for the waiting goroutine. So, the state of G1 is like this:

As you can see, G1 is placed on a run queue. Now we know what’s done by the goroutine and the go scheduler in case of buffered channels. In this example, the sender gorountine came first, but what if the receiver goroutine comes first? What if there’s no data in the channel and the receiver goroutine is executed first? The receiver goroutine (G2) will create a sudog in recvq on the hchan struct. Things are a little twisted when G1 goroutine activates. It will now see whether there are any goroutines waiting in the recvq, and if there is, it will copy the task to the waiting goroutine’s (G2) memory location, i.e., the elem attribute of the sudog.

This is incredible! Instead of writing to the buffer, it will write the task/data to the waiting goroutine’s space simply to avoid G2’s overhead when it activates. We know that each goroutine has its own resizable stack, and they never use each other’s space except in case of channels. Until now, we have seen how the send and receive happens in a buffered channel.

This may have been confusing, so let me give you the summary of the send operation.

Summary of a send operation for buffered channels:
1. Acquire lock on the entire channel or the hchan struct.
2. Check if there’s any sudog or a waiting goroutine in the recvq. If so, then put the element directly into its stack. We saw this just now with G1 writing to G2’s stack.
3. If recvq is empty, then check whether the buffer has space. If yes, then do a memory copy of the data.
4. If the buffer is full, then create a sudog under sendq of the hchan struct, which will have details, like a currently executing goroutine and the data to put on the channel.
We have seen all the above steps in detail, but concentrate on the last point.

It’s kind of similar to an unbuffered channel. We know that for unbuffered channels, every read must have a write operation first and vice versa.

So, keep in mind that an unbuffered channel always works like a direct send. So, a summary of a read and write operation in unbuffered channel could be:
- Sender first: At this point, there’s no receiver, so the sender will create a sudog of itself and the receiver will receive the value from the sudog.
- Receiver first: The receiver will create a sudog in recvq, and the sender will directly put the data in the receiver’s stack.
With this, we have covered the basics of channels. We’ve learned how read and write operates in a buffered and unbuffered channel, and we talked about the Go runtime scheduler.

Conclusion:

Channels is a very interesting Golang topic. They seem to be difficult to understand, but when you learn the mechanism, they’re very powerful and help you to achieve concurrency in applications. Hopefully, this blog helps your understanding of the fundamental concepts and the operations of channels.
December 12, 2022