Category: Services

Mesosphere DC/OS Masterclass : Tips and Tricks to Make Life Easier

DC/OS is an open-source operating system and distributed system for data center built on Apache Mesos distributed system kernel. As a distributed system, it is a cluster of master nodes and private/public nodes, where each node also has host operating system which manages the underlying machine.

It enables the management of multiple machines as if they were a single computer. It automates resource management, schedules process placement, facilitates inter-process communication, and simplifies the installation and management of distributed services. Its included web interface and available command-line interface (CLI) facilitate remote management and monitoring of the cluster and its services.

Distributed System : DC/OS is distributed system with group of private and public nodes which are coordinated by master nodes.
Cluster Manager : DC/OS is responsible for running tasks on agent nodes and providing required resources to them. DC/OS uses Apache Mesos to provide cluster management functionality.
Container Platform : All DC/OS tasks are containerized. DC/OS uses two different container runtimes, i.e. docker and mesos. So that containers can be started from docker images or they can be native executables (binaries or scripts) which are containerized at runtime by mesos.
Operating System : As name specifies, DC/OS is an operating system which abstracts cluster h/w and s/w resources and provide common services to applications.

Unlike Linux, DC/OS is not a host operating system. DC/OS spans multiple machines, but relies on each machine to have its own host operating system and host kernel.

The high level architecture of DC/OS can be seen below :

For the detailed architecture and components of DC/OS, please click here.

Adoption and usage of Mesosphere DC/OS:

Mesosphere customers include :

30% of the Fortune 50 U.S. Companies
5 of the top 10 North American Banks
7 of the top 12 Worldwide Telcos
5 of the top 10 Highest Valued Startups

Some companies using DC/OS are :

Cisco
Yelp
Tommy Hilfiger
Uber
Netflix
Verizon
Cerner
NIO

Installing and using DC/OS

A guide to installing DC/OS can be found here. After installing DC/OS on any platform, install dcos cli by following documentation found here.

Using dcos cli, we can manager cluster nodes, manage marathon tasks and services, install/remove packages from universe and it provides great support for automation process as each cli command can be output to json.

NOTE: The tasks below are executed with and tested on below tools:

DC/OS 1.11 Open Source
DC/OS cli 0.6.0
jq:1.5-1-a5b5cbe

DC/OS commands and scripts

Setup DC/OS cli with DC/OS cluster

dcos cluster setup <CLUSTER URL>

dcos cluster setup <CLUSTER URL>

Example :

dcos cluster setup http://dcos-cluster.com

dcos cluster setup http://dcos-cluster.com

The above command will give you the link for oauth authentication and prompt for auth token. You can authenticate yourself with any of Google, Github or Microsoft account. Paste the token generated after authentication to cli prompt. (Provided oauth is enabled).

DC/OS authentication token

docs config show core.dcos_acs_token

docs config show core.dcos_acs_token

DC/OS cluster url

dcos config show core.dcos_url

dcos config show core.dcos_url

DC/OS cluster name

dcos config show cluster.name

dcos config show cluster.name

Access Mesos UI

<DC/OS_CLUSTER_URL>/mesos

<DC/OS_CLUSTER_URL>/mesos

Example:

http://dcos-cluster.com/mesos

http://dcos-cluster.com/mesos

Access Marathon UI

<DC/OS_CLUSTER_URL>/service/marathon

<DC/OS_CLUSTER_URL>/service/marathon

Example:

http://dcos-cluster.com/service/marathon

http://dcos-cluster.com/service/marathon

Access any DC/OS service, like Marathon, Kafka, Elastic, Spark etc.[DC/OS Services]

<DC/OS_CLUSTER_URL>/service/<SERVICE_NAME>

<DC/OS_CLUSTER_URL>/service/<SERVICE_NAME>

Example:

http://dcos-cluster.com/service/marathon
http://dcos-cluster.com/service/kafka

http://dcos-cluster.com/service/marathon
http://dcos-cluster.com/service/kafka

Access DC/OS slaves info in json using Mesos API [Mesos Endpoints]

curl -H "Authorization: Bearer $(dcos config show 
core.dcos_acs_token)" $(dcos config show 
core.dcos_url)/mesos/slaves | jq

curl -H "Authorization: Bearer $(dcos config show 
core.dcos_acs_token)" $(dcos config show 
core.dcos_url)/mesos/slaves | jq

Access DC/OS slaves info in json using DC/OS cli

dcos node --json

dcos node --json

Note : DC/OS cli ‘dcos node –json’ is equivalent to running mesos slaves endpoint (/mesos/slaves)

Access DC/OS private slaves info using DC/OS cli

dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip == null) | "Private Agent : " + .hostname ' -r

dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip == null) | "Private Agent : " + .hostname ' -r

Access DC/OS public slaves info using DC/OS cli

dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip != null) | "Public Agent : " + .hostname ' -r

dcos node --json | jq '.[] | select(.type | contains("agent")) | select(.attributes.public_ip != null) | "Public Agent : " + .hostname ' -r

Access DC/OS private and public slaves info using DC/OS cli

dcos node --json | jq '.[] | select(.type | contains("agent")) | if (.attributes.public_ip != null) then "Public Agent : " else "Private Agent : " end + " - " + .hostname ' -r | sort

dcos node --json | jq '.[] | select(.type | contains("agent")) | if (.attributes.public_ip != null) then "Public Agent : " else "Private Agent : " end + " - " + .hostname ' -r | sort

Get public IP of all public agents

#!/bin/bash
for id in $(dcos node --json | jq --raw-output '.[] | select(.attributes.public_ip == "true") | .id'); 
do 
      dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --mesos-id=$id "curl -s ifconfig.co"
done 2>/dev/null

#!/bin/bash

for id in $(dcos node --json | jq --raw-output '.[] | select(.attributes.public_ip == "true") | .id'); 
do 
      dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --mesos-id=$id "curl -s ifconfig.co"
done 2>/dev/null

Note: As ‘dcos node ssh’ requires private key to be added to ssh. Make sure you add your private key as ssh identity using :

ssh-add </path/to/private/key/file/.pem>

ssh-add </path/to/private/key/file/.pem>

Get public IP of master leader

dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --leader "curl -s ifconfig.co" 2>/dev/null

dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --leader "curl -s ifconfig.co" 2>/dev/null

Get all master nodes and their private ip

dcos node --json | jq '.[] | select(.type | contains("master"))
| .ip + " = " + .type' -r

dcos node --json | jq '.[] | select(.type | contains("master"))
| .ip + " = " + .type' -r

Get list of all users who have access to DC/OS cluster

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
"$(dcos config show core.dcos_url)/acs/api/v1/users" | jq ‘.array[].uid’ -r

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
"$(dcos config show core.dcos_url)/acs/api/v1/users" | jq ‘.array[].uid’ -r

Add users to cluster using Mesosphere script (Run this on master)

Users to add are given in list.txt, each user on new line

for i in `cat list.txt`; do echo $i;
sudo -i dcos-shell /opt/mesosphere/bin/dcos_add_user.py $i; done

for i in `cat list.txt`; do echo $i;
sudo -i dcos-shell /opt/mesosphere/bin/dcos_add_user.py $i; done

Add users to cluster using DC/OS API

#!/bin/bash
# Uage dcosAddUsers.sh <Users to add are given in list.txt, each user on new line>
for i in `cat users.list`; 
do 
  echo $i
  curl -X PUT -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
done

#!/bin/bash

# Uage dcosAddUsers.sh <Users to add are given in list.txt, each user on new line>
for i in `cat users.list`; 
do 
  echo $i
  curl -X PUT -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
done

Delete users from DC/OS cluster organization

#!/bin/bash
# Usage dcosDeleteUsers.sh <Users to delete are given in list.txt, each user on new line>
for i in `cat users.list`; 
do 
  echo $i
  curl -X DELETE -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
done

#!/bin/bash

# Usage dcosDeleteUsers.sh <Users to delete are given in list.txt, each user on new line>

for i in `cat users.list`; 
do 
  echo $i
  curl -X DELETE -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/acs/api/v1/users/$i" -d "{}"
done

Offers/resources from individual DC/OS agent

In recent versions of the many dcos services, a scheduler endpoint at

http://yourcluster.com/service/<service-name>/v1/debug/offers

http://yourcluster.com/service/<service-name>/v1/debug/offers

will display an HTML table containing a summary of recently-evaluated offers. This table’s contents are currently very similar to what can be found in logs, but in a slightly more accessible format. Alternately, we can look at the scheduler’s logs in stdout. An offer is a set of resources all from one individual DC/OS agent.

<DC/OS_CLUSTER_URL>/service/<service_name>/v1/debug/offers

<DC/OS_CLUSTER_URL>/service/<service_name>/v1/debug/offers

Example:

http://dcos-cluster.com/service/kafka/v1/debug/offers
http://dcos-cluster.com/service/elastic/v1/debug/offers

http://dcos-cluster.com/service/kafka/v1/debug/offers
http://dcos-cluster.com/service/elastic/v1/debug/offers

Save JSON configs of all running Marathon apps

#!/bin/bash
# Save marathon configs in json format for all marathon apps
# Usage : saveMarathonConfig.sh
for service in `dcos marathon app list --quiet | tr -d "/" | sort`; do
  dcos marathon app show $service | jq '. | del(.tasks, .version, .versionInfo, .tasksHealthy, .tasksRunning, .tasksStaged, .tasksUnhealthy, .deployments, .executor, .lastTaskFailure, .args, .ports, .residency, .secrets, .storeUrls, .uris, .user)' >& $service.json
done

#!/bin/bash

# Save marathon configs in json format for all marathon apps
# Usage : saveMarathonConfig.sh

for service in `dcos marathon app list --quiet | tr -d "/" | sort`; do
  dcos marathon app show $service | jq '. | del(.tasks, .version, .versionInfo, .tasksHealthy, .tasksRunning, .tasksStaged, .tasksUnhealthy, .deployments, .executor, .lastTaskFailure, .args, .ports, .residency, .secrets, .storeUrls, .uris, .user)' >& $service.json
done

Get report of Marathon apps with details like container type, Docker image, tag or service version used by Marathon app.

#!/bin/bash
TMP_CSV_FILE=$(mktemp /tmp/dcos-config.XXXXXX.csv)
TMP_CSV_FILE_SORT="${TMP_CSV_FILE}_sort"
#dcos marathon app list --json | jq '.[] | if (.container.docker.image != null ) then .id + ",Docker Application," + .container.docker.image else .id + ",DCOS Service," + .labels.DCOS_PACKAGE_VERSION end' -r > $TMP_CSV_FILE
dcos marathon app list --json | jq '.[] | .id + if (.container.type == "DOCKER") then ",Docker Container," + .container.docker.image else ",Mesos Container," + if(.labels.DCOS_PACKAGE_VERSION !=null) then .labels.DCOS_PACKAGE_NAME+":"+.labels.DCOS_PACKAGE_VERSION  else "[ CMD ]" end end' -r > $TMP_CSV_FILE
sed -i "s|^/||g" $TMP_CSV_FILE
sort -t "," -k2,2 -k3,3 -k1,1 $TMP_CSV_FILE > ${TMP_CSV_FILE_SORT}
cnt=1
printf '%.0s=' {1..150}
printf "n  %-5s%-35s%-23s%-40s%-20sn" "No" "Application Name" "Container Type" "Docker Image" "Tag / Version"
printf '%.0s=' {1..150}
while IFS=, read -r app typ image; 
do
        tag=`echo $image | awk -F':' -v im="$image" '{tag=(im=="[ CMD ]")?"NA":($2=="")?"latest":$2; print tag}'`
        image=`echo $image | awk -F':' '{print $1}'`
        printf "n  %-5s%-35s%-23s%-40s%-20s" "$cnt" "$app" "$typ" "$image" "$tag"
        cnt=$((cnt + 1))
        sleep 0.3
done < $TMP_CSV_FILE_SORT
printf "n"
printf '%.0s=' {1..150}
printf "n"

#!/bin/bash

TMP_CSV_FILE=$(mktemp /tmp/dcos-config.XXXXXX.csv)
TMP_CSV_FILE_SORT="${TMP_CSV_FILE}_sort"
#dcos marathon app list --json | jq '.[] | if (.container.docker.image != null ) then .id + ",Docker Application," + .container.docker.image else .id + ",DCOS Service," + .labels.DCOS_PACKAGE_VERSION end' -r > $TMP_CSV_FILE
dcos marathon app list --json | jq '.[] | .id + if (.container.type == "DOCKER") then ",Docker Container," + .container.docker.image else ",Mesos Container," + if(.labels.DCOS_PACKAGE_VERSION !=null) then .labels.DCOS_PACKAGE_NAME+":"+.labels.DCOS_PACKAGE_VERSION  else "[ CMD ]" end end' -r > $TMP_CSV_FILE
sed -i "s|^/||g" $TMP_CSV_FILE
sort -t "," -k2,2 -k3,3 -k1,1 $TMP_CSV_FILE > ${TMP_CSV_FILE_SORT}
cnt=1
printf '%.0s=' {1..150}
printf "n  %-5s%-35s%-23s%-40s%-20sn" "No" "Application Name" "Container Type" "Docker Image" "Tag / Version"
printf '%.0s=' {1..150}
while IFS=, read -r app typ image; 
do
        tag=`echo $image | awk -F':' -v im="$image" '{tag=(im=="[ CMD ]")?"NA":($2=="")?"latest":$2; print tag}'`
        image=`echo $image | awk -F':' '{print $1}'`
        printf "n  %-5s%-35s%-23s%-40s%-20s" "$cnt" "$app" "$typ" "$image" "$tag"
        cnt=$((cnt + 1))
        sleep 0.3
done < $TMP_CSV_FILE_SORT
printf "n"
printf '%.0s=' {1..150}
printf "n"

Get DC/OS nodes with more information like node type, node ip, attributes, number of running tasks, free memory, free cpu etc.

#!/bin/bash
printf "n  %-15s %-18s%-18s%-10s%-15s%-10sn" "Node Type" "Node IP" "Attribute" "Tasks" "Mem Free (MB)" "CPU Free"
printf '%.0s=' {1..90}
printf "n"
TAB=`echo -e "t"`
dcos node --json | jq '.[] | if (.type | contains("leader")) then "Master (leader)" elif ((.type | contains("agent")) and .attributes.public_ip != null) then "Public Agent" elif ((.type | contains("agent")) and .attributes.public_ip == null) then "Private Agent" else empty end + "t"+ if(.type |contains("master")) then .ip else .hostname end + "t" +  (if (.attributes | length !=0) then (.attributes | to_entries[] | join(" = ")) else "NA" end) + "t" + if(.type |contains("agent")) then (.TASK_RUNNING|tostring) + "t" + ((.resources.mem - .used_resources.mem)| tostring) + "tt" +  ((.resources.cpus - .used_resources.cpus)| tostring)  else "ttNAtNAttNA"  end' -r | sort -t"$TAB" -k1,1d -k3,3d -k2,2d
printf '%.0s=' {1..90}
printf "n"

#!/bin/bash

printf "n  %-15s %-18s%-18s%-10s%-15s%-10sn" "Node Type" "Node IP" "Attribute" "Tasks" "Mem Free (MB)" "CPU Free"
printf '%.0s=' {1..90}
printf "n"
TAB=`echo -e "t"`
dcos node --json | jq '.[] | if (.type | contains("leader")) then "Master (leader)" elif ((.type | contains("agent")) and .attributes.public_ip != null) then "Public Agent" elif ((.type | contains("agent")) and .attributes.public_ip == null) then "Private Agent" else empty end + "t"+ if(.type |contains("master")) then .ip else .hostname end + "t" +  (if (.attributes | length !=0) then (.attributes | to_entries[] | join(" = ")) else "NA" end) + "t" + if(.type |contains("agent")) then (.TASK_RUNNING|tostring) + "t" + ((.resources.mem - .used_resources.mem)| tostring) + "tt" +  ((.resources.cpus - .used_resources.cpus)| tostring)  else "ttNAtNAttNA"  end' -r | sort -t"$TAB" -k1,1d -k3,3d -k2,2d
printf '%.0s=' {1..90}
printf "n"

Framework Cleaner

Uninstall framework and clean reserved resources if any after framework is deleted/uninstalled. (applicable if running DC/OS 1.9 or older, if higher than 1.10, then only uninstall cli is sufficient)

SERVICE_NAME=
dcos package uninstall $SERVICE_NAME
dcos node ssh --option StrictHostKeyChecking=no --master-proxy
--leader "docker run mesosphere/janitor /janitor.py -r
${SERVICE_NAME}-role -p ${SERVICE_NAME}-principal -z dcos-service-${SERVICE_NAME}"

SERVICE_NAME=
dcos package uninstall $SERVICE_NAME
dcos node ssh --option StrictHostKeyChecking=no --master-proxy
--leader "docker run mesosphere/janitor /janitor.py -r
${SERVICE_NAME}-role -p ${SERVICE_NAME}-principal -z dcos-service-${SERVICE_NAME}"

Get DC/OS apps and their placement constraints

dcos marathon app list --json | jq '.[] |
if (.constraints != null) then .id, .constraints else empty end'

dcos marathon app list --json | jq '.[] |
if (.constraints != null) then .id, .constraints else empty end'

Run shell command on all slaves

#!/bin/bash
# Run any shell command on all slave nodes (private and public)
# Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
CMD=$1
for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'`; do 
   echo -e "n###> Running command [ $CMD ] on $i"
   dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
   echo -e "======================================n"
done

#!/bin/bash

# Run any shell command on all slave nodes (private and public)

# Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
CMD=$1
for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'`; do 
   echo -e "n###> Running command [ $CMD ] on $i"
   dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
   echo -e "======================================n"
done

Run shell command on master leader

CMD=<shell command, Ex: ulimit -a >dcos node ssh --option StrictHostKeyChecking=no --option
LogLevel=quiet --master-proxy --leader "$CMD"

CMD=<shell command, Ex: ulimit -a >dcos node ssh --option StrictHostKeyChecking=no --option
LogLevel=quiet --master-proxy --leader "$CMD"

Run shell command on all master nodes

#!/bin/bash
# Run any shell command on all master nodes
# Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
CMD=$1
for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'` 
do 
  echo -e "n###> Running command [ $CMD ] on $i"
  dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
 echo -e "======================================n"
done

#!/bin/bash

# Run any shell command on all master nodes

# Usage : dcosRunOnAllSlaves.sh <CMD= any shell command to run, Ex: ulimit -a >
CMD=$1
for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'` 
do 
  echo -e "n###> Running command [ $CMD ] on $i"
  dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "$CMD"
 echo -e "======================================n"
done

Add node attributes to dcos nodes and run apps on nodes with required attributes using placement constraints

#!/bin/bash
#1. SSH on node 
#2. Create or edit file /var/lib/dcos/mesos-slave-common
#3. Add contents as :
#    MESOS_ATTRIBUTES=<key>:<value>
#    Example:
#    MESOS_ATTRIBUTES=TYPE:DB;DB_TYPE:MONGO;
#4. Stop dcos-mesos-slave service
#    systemctl stop dcos-mesos-slave
#5. Remove link for latest slave metadata
#    rm -f /var/lib/mesos/slave/meta/slaves/latest
#6. Start dcos-mesos-slave service
#    systemctl start dcos-mesos-slave
#7. Wait for some time, node will be in HEALTHY state again.
#8. Add app placement constraint with field = key and value = value
#9. Verify attributes, run on any node
#    curl -s http://leader.mesos:5050/state | jq '.slaves[]| .hostname ,.attributes'
#    OR Check DCOS cluster UI
#    Nodes => Select any Node => Details Tab
tmpScript=$(mktemp "/tmp/addDcosNodeAttributes-XXXXXXXX")
# key:value paired attribues, separated by ;
ATTRIBUTES=NODE_TYPE:GPU_NODE
cat <<EOF > ${tmpScript}
echo "MESOS_ATTRIBUTES=${ATTRIBUTES}" | sudo tee /var/lib/dcos/mesos-slave-common
sudo systemctl stop dcos-mesos-slave
sudo rm -f /var/lib/mesos/slave/meta/slaves/latest
sudo systemctl start dcos-mesos-slave
EOF
# Add the private ip of nodes on which you want to add attrubutes, one ip per line.
for i in `cat nodes.txt`; do 
    echo $i
    dcos node ssh --master-proxy --option StrictHostKeyChecking=no --private-ip $i <$tmpScript
    sleep 10
done

#!/bin/bash

#1. SSH on node 
#2. Create or edit file /var/lib/dcos/mesos-slave-common
#3. Add contents as :
#    MESOS_ATTRIBUTES=<key>:<value>
#    Example:
#    MESOS_ATTRIBUTES=TYPE:DB;DB_TYPE:MONGO;
#4. Stop dcos-mesos-slave service
#    systemctl stop dcos-mesos-slave
#5. Remove link for latest slave metadata
#    rm -f /var/lib/mesos/slave/meta/slaves/latest
#6. Start dcos-mesos-slave service
#    systemctl start dcos-mesos-slave
#7. Wait for some time, node will be in HEALTHY state again.
#8. Add app placement constraint with field = key and value = value
#9. Verify attributes, run on any node
#    curl -s http://leader.mesos:5050/state | jq '.slaves[]| .hostname ,.attributes'
#    OR Check DCOS cluster UI
#    Nodes => Select any Node => Details Tab

tmpScript=$(mktemp "/tmp/addDcosNodeAttributes-XXXXXXXX")

# key:value paired attribues, separated by ;
ATTRIBUTES=NODE_TYPE:GPU_NODE

cat <<EOF > ${tmpScript}
echo "MESOS_ATTRIBUTES=${ATTRIBUTES}" | sudo tee /var/lib/dcos/mesos-slave-common
sudo systemctl stop dcos-mesos-slave
sudo rm -f /var/lib/mesos/slave/meta/slaves/latest
sudo systemctl start dcos-mesos-slave
EOF

# Add the private ip of nodes on which you want to add attrubutes, one ip per line.
for i in `cat nodes.txt`; do 
    echo $i
    dcos node ssh --master-proxy --option StrictHostKeyChecking=no --private-ip $i <$tmpScript
    sleep 10
done

Install DC/OS Datadog metrics plugin on all DC/OS nodes

#!/bin/bash

# Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>

DDAPI=$1

if [[ -z $DDAPI ]]; then
    echo "[Datadog Plugin] Need datadog API key as parameter."
    echo "[Datadog Plugin] Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>."
fi
tmpScriptMaster=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")
tmpScriptAgent=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")

declare agent=$tmpScriptAgent
declare master=$tmpScriptMaster

for role in "agent" "master"
do
cat <<EOF > ${!role}
curl -s -o /opt/mesosphere/bin/dcos-metrics-datadog -L https://downloads.mesosphere.io/dcos-metrics/plugins/datadog
chmod +x /opt/mesosphere/bin/dcos-metrics-datadog
echo "[Datadog Plugin] Downloaded dcos datadog metrics plugin."
export DD_API_KEY=$DDAPI
export AGENT_ROLE=$role
sudo curl -s -o /etc/systemd/system/dcos-metrics-datadog.service https://downloads.mesosphere.io/dcos-metrics/plugins/datadog.service
echo "[Datadog Plugin] Downloaded dcos-metrics-datadog.service."
sudo sed -i "s/--dcos-role master/--dcos-role \$AGENT_ROLE/g;s/--datadog-key .*/--datadog-key \$DD_API_KEY/g" /etc/systemd/system/dcos-metrics-datadog.service
echo "[Datadog Plugin] Updated dcos-metrics-datadog.service with DD API Key and agent role."
sudo systemctl daemon-reload
sudo systemctl start dcos-metrics-datadog.service
echo "[Datadog Plugin] dcos-metrics-datadog.service is started !"
servStatus=\$(sudo systemctl is-failed dcos-metrics-datadog.service)
echo "[Datadog Plugin] dcos-metrics-datadog.service status : \${servStatus}"
#sudo systemctl status dcos-metrics-datadog.service | head -3
#sudo journalctl -u dcos-metrics-datadog
EOF
done

echo "[Datadog Plugin] Temp script for master saved at : $tmpScriptMaster"
echo "[Datadog Plugin] Temp script for agent saved at : $tmpScriptAgent"

for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` 
do 
    echo -e "\n###> Node - $i"
    dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptAgent
    echo -e "======================================================="
done

for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'` 
do 
    echo -e "\n###> Master Node - $i"
    dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptMaster
    echo -e "======================================================="
done

# Check status of dcos-metrics-datadog.service on all nodes.
#for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` ; do  echo -e "\n###> $i"; dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "sudo systemctl is-failed dcos-metrics-datadog.service"; echo -e "======================================\n"; done

#!/bin/bash

# Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>

DDAPI=$1

if [[ -z $DDAPI ]]; then
    echo "[Datadog Plugin] Need datadog API key as parameter."
    echo "[Datadog Plugin] Usage : bash installDCOSDataDogMetricsPlugin.sh <Datadog API KEY>."
fi
tmpScriptMaster=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")
tmpScriptAgent=$(mktemp "/tmp/installDatadogPlugin-XXXXXXXX")

declare agent=$tmpScriptAgent
declare master=$tmpScriptMaster

for role in "agent" "master"
do
cat <<EOF > ${!role}
curl -s -o /opt/mesosphere/bin/dcos-metrics-datadog -L https://downloads.mesosphere.io/dcos-metrics/plugins/datadog
chmod +x /opt/mesosphere/bin/dcos-metrics-datadog
echo "[Datadog Plugin] Downloaded dcos datadog metrics plugin."
export DD_API_KEY=$DDAPI
export AGENT_ROLE=$role
sudo curl -s -o /etc/systemd/system/dcos-metrics-datadog.service https://downloads.mesosphere.io/dcos-metrics/plugins/datadog.service
echo "[Datadog Plugin] Downloaded dcos-metrics-datadog.service."
sudo sed -i "s/--dcos-role master/--dcos-role \$AGENT_ROLE/g;s/--datadog-key .*/--datadog-key \$DD_API_KEY/g" /etc/systemd/system/dcos-metrics-datadog.service
echo "[Datadog Plugin] Updated dcos-metrics-datadog.service with DD API Key and agent role."
sudo systemctl daemon-reload
sudo systemctl start dcos-metrics-datadog.service
echo "[Datadog Plugin] dcos-metrics-datadog.service is started !"
servStatus=\$(sudo systemctl is-failed dcos-metrics-datadog.service)
echo "[Datadog Plugin] dcos-metrics-datadog.service status : \${servStatus}"
#sudo systemctl status dcos-metrics-datadog.service | head -3
#sudo journalctl -u dcos-metrics-datadog
EOF
done

echo "[Datadog Plugin] Temp script for master saved at : $tmpScriptMaster"
echo "[Datadog Plugin] Temp script for agent saved at : $tmpScriptAgent"

for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` 
do 
    echo -e "\n###> Node - $i"
    dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptAgent
    echo -e "======================================================="
done

for i in `dcos node | egrep -v "TYPE|agent" | awk '{print $2}'` 
do 
    echo -e "\n###> Master Node - $i"
    dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --master-proxy --private-ip=$i < $tmpScriptMaster
    echo -e "======================================================="
done

# Check status of dcos-metrics-datadog.service on all nodes.
#for i in `dcos node | egrep -v "TYPE|master" | awk '{print $1}'` ; do  echo -e "\n###> $i"; dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --master-proxy --private-ip=$i "sudo systemctl is-failed dcos-metrics-datadog.service"; echo -e "======================================\n"; done

Get app / node metrics fetched by dcos-metrics component using metrics API

Get DC/OS node id [dcos node]
Get Node metrics (CPU, memory, local filesystems, networks, etc) : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/node</agent_id></dc>
Get id of all containers running on that agent : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/containers</agent_id></dc>
Get Resource allocation and usage for the given container ID. : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/containers/<container_id></container_id></agent_id></dc>
Get Application-level metrics from the container (shipped in StatsD format using the listener available at STATSD_UDP_HOST and STATSD_UDP_PORT) : <dc os_cluster_url=””>/system/v1/agent/<agent_id>/metrics/v0/containers/<container_id>/app </container_id></agent_id></dc>

Get app / node metrics fetched by dcos-metrics component using dcos cli

Summary of container metrics for a specific task

dcos task metrics summary <task-id>

dcos task metrics summary <task-id>

All metrics in details for a specific task

dcos task metrics details <task-id>

dcos task metrics details <task-id>

Summary of Node metrics for a specific node

dcos task metrics summary <mesos-node-id>

dcos task metrics summary <mesos-node-id>

All Node metrics in details for a specific node

dcos node metrics details <mesos-node-id>

dcos node metrics details <mesos-node-id>

NOTE – All above commands have ‘–json’ flag to use them programmatically.

Launch / run command inside container for a task

DC/OS task exec cli only supports Mesos containers, this script supports both Mesos and Docker containers.

#!/bin/bash
echo "DCOS Task Exec 2.0"
if [ "$#" -eq 0 ]; then
        echo "Need task name or id as input. Exiting."
        exit 1
fi
taskName=$1
taskCmd=${2:-bash}
TMP_TASKLIST_JSON=/tmp/dcostasklist.json
dcos task --json > $TMP_TASKLIST_JSON
taskExist=`cat /tmp/dcostasklist.json | jq --arg tname $taskName '.[] | if(.name == $tname ) then .name else empty end' -r | wc -l`
if [[ $taskExist -eq 0 ]]; then 
        echo "No task with name $taskName exists."
        echo "Do you mean ?"
        dcos task | grep $taskName | awk '{print $1}'
        exit 1
fi
taskType=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .container.type' -r`
TaskId=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .id' -r`
if [[ $taskExist -ne 1 ]]; then
        echo -e "More than one instances. Please select task ID for executing command.n"
        #allTaskIds=$(dcos task $taskName | tee /dev/tty | grep -v "NAME" | awk '{print $5}' | paste -s -d",")
        echo ""
        read TaskId
fi
if [[ $taskType !=  "DOCKER" ]]; then
        echo "Task [ $taskName ] is of type MESOS Container."
        execCmd="dcos task exec --interactive --tty $TaskId $taskCmd"
        echo "Running [$execCmd]"
        $execCmd
else
        echo "Task [ $taskName ] is of type DOCKER Container."
        taskNodeIP=`dcos task $TaskId | awk 'FNR == 2 {print $2}'`
        echo "Task [ $taskName ] with task Id [ $TaskId ] is running on node [ $taskNodeIP ]."
        taskContID=`dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --private-ip=$taskNodeIP --master-proxy "docker ps -q --filter "label=MESOS_TASK_ID=$TaskId"" 2> /dev/null`
        taskContID=`echo $taskContID | tr -d 'r'`
        echo "Task Docker Container ID : [ $taskContID ]"
        echo "Running [ docker exec -it $taskContID $taskCmd ]"
        dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --private-ip=$taskNodeIP --master-proxy "docker exec -it $taskContID $taskCmd" 2>/dev/null
fi

#!/bin/bash

echo "DCOS Task Exec 2.0"
if [ "$#" -eq 0 ]; then
        echo "Need task name or id as input. Exiting."
        exit 1
fi
taskName=$1
taskCmd=${2:-bash}
TMP_TASKLIST_JSON=/tmp/dcostasklist.json
dcos task --json > $TMP_TASKLIST_JSON
taskExist=`cat /tmp/dcostasklist.json | jq --arg tname $taskName '.[] | if(.name == $tname ) then .name else empty end' -r | wc -l`
if [[ $taskExist -eq 0 ]]; then 
        echo "No task with name $taskName exists."
        echo "Do you mean ?"
        dcos task | grep $taskName | awk '{print $1}'
        exit 1
fi
taskType=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .container.type' -r`
TaskId=`cat $TMP_TASKLIST_JSON | jq --arg tname $taskName '[.[] | select(.name == $tname)][0] | .id' -r`
if [[ $taskExist -ne 1 ]]; then
        echo -e "More than one instances. Please select task ID for executing command.n"
        #allTaskIds=$(dcos task $taskName | tee /dev/tty | grep -v "NAME" | awk '{print $5}' | paste -s -d",")
        echo ""
        read TaskId
fi
if [[ $taskType !=  "DOCKER" ]]; then
        echo "Task [ $taskName ] is of type MESOS Container."
        execCmd="dcos task exec --interactive --tty $TaskId $taskCmd"
        echo "Running [$execCmd]"
        $execCmd
else
        echo "Task [ $taskName ] is of type DOCKER Container."
        taskNodeIP=`dcos task $TaskId | awk 'FNR == 2 {print $2}'`
        echo "Task [ $taskName ] with task Id [ $TaskId ] is running on node [ $taskNodeIP ]."
        taskContID=`dcos node ssh --option LogLevel=quiet --option StrictHostKeyChecking=no --private-ip=$taskNodeIP --master-proxy "docker ps -q --filter "label=MESOS_TASK_ID=$TaskId"" 2> /dev/null`
        taskContID=`echo $taskContID | tr -d 'r'`
        echo "Task Docker Container ID : [ $taskContID ]"
        echo "Running [ docker exec -it $taskContID $taskCmd ]"
        dcos node ssh --option StrictHostKeyChecking=no --option LogLevel=quiet --private-ip=$taskNodeIP --master-proxy "docker exec -it $taskContID $taskCmd" 2>/dev/null
fi

Get DC/OS tasks by node

#!/bin/bash 
function tasksByNodeAPI
{
    echo "DC/OS Tasks By Node"
    if [ "$#" -eq 0 ]; then
        echo "Need node ip as input. Exiting."
        exit 1
    fi
    nodeIp=$1
    mesosId=`dcos node | grep $nodeIp | awk '{print $3}'`
    if [ -z "mesosId" ]; then
        echo "No node found with ip $nodeIp. Exiting."
        exit 1
    fi
    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/mesos/tasks?limit=10000" | jq --arg mesosId $mesosId '.tasks[] | select (.slave_id == $mesosId and .state == "TASK_RUNNING") | .name + "ttt" + .id'  -r
}
function tasksByNodeCLI
{
        echo "DC/OS Tasks By Node"
        if [ "$#" -eq 0 ]; then
                echo "Need node ip as input. Exiting."
                exit 1
        fi
        nodeIp=$1
        dcos task | egrep "HOST|$nodeIp"
}

#!/bin/bash 

function tasksByNodeAPI
{
    echo "DC/OS Tasks By Node"
    if [ "$#" -eq 0 ]; then
        echo "Need node ip as input. Exiting."
        exit 1
    fi
    nodeIp=$1
    mesosId=`dcos node | grep $nodeIp | awk '{print $3}'`
    if [ -z "mesosId" ]; then
        echo "No node found with ip $nodeIp. Exiting."
        exit 1
    fi
    curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)" "$(dcos config show core.dcos_url)/mesos/tasks?limit=10000" | jq --arg mesosId $mesosId '.tasks[] | select (.slave_id == $mesosId and .state == "TASK_RUNNING") | .name + "ttt" + .id'  -r
}

function tasksByNodeCLI
{
        echo "DC/OS Tasks By Node"
        if [ "$#" -eq 0 ]; then
                echo "Need node ip as input. Exiting."
                exit 1
        fi
        nodeIp=$1
        dcos task | egrep "HOST|$nodeIp"
}

Get cluster metadata – cluster Public IP and cluster ID

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"           
$(dcos config show core.dcos_url)/metadata

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"           
$(dcos config show core.dcos_url)/metadata

Sample Output:

{
"PUBLIC_IPV4": "123.456.789.012",
"CLUSTER_ID": "abcde-abcde-abcde-abcde-abcde-abcde"
}

{
"PUBLIC_IPV4": "123.456.789.012",
"CLUSTER_ID": "abcde-abcde-abcde-abcde-abcde-abcde"
}

Get DC/OS metadata – DC/OS version

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/dcos-metadata/dcos-version.jsonq

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/dcos-metadata/dcos-version.jsonq

Sample Output:

{
"version": "1.11.0",
"dcos-image-commit": "b6d6ad4722600877fde2860122f870031d109da3",
"bootstrap-id": "a0654657903fb68dff60f6e522a7f241c1bfbf0f"
}

{
"version": "1.11.0",
"dcos-image-commit": "b6d6ad4722600877fde2860122f870031d109da3",
"bootstrap-id": "a0654657903fb68dff60f6e522a7f241c1bfbf0f"
}

Get Mesos version

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/mesos/version

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/mesos/version

Sample Output:

{
"build_date": "2018-02-27 21:31:27",
"build_time": 1519767087.0,
"build_user": "",
"git_sha": "0ba40f86759307cefab1c8702724debe87007bb0",
"version": "1.5.0"
}

{
"build_date": "2018-02-27 21:31:27",
"build_time": 1519767087.0,
"build_user": "",
"git_sha": "0ba40f86759307cefab1c8702724debe87007bb0",
"version": "1.5.0"
}

Access DC/OS cluster exhibitor UI (Exhibitor supervises ZooKeeper and provides a management web interface)

<CLUSTER_URL>/exhibitor

<CLUSTER_URL>/exhibitor

Access DC/OS cluster data from cluster zookeeper using Zookeeper Python client – Run inside any node / container

from kazoo.client import KazooClient
zk = KazooClient(hosts='leader.mesos:2181', read_only=True)
zk.start()
clusterId = ""
# Here we can give znode path to retrieve its decoded data,
# for ex to get cluster-id, use
# data, stat = zk.get("/cluster-id")
# clusterId = data.decode("utf-8")
# Get cluster Id
if zk.exists("/cluster-id"):
    data, stat = zk.get("/cluster-id")
    clusterId = data.decode("utf-8")
zk.stop()
print (clusterId)

from kazoo.client import KazooClient

zk = KazooClient(hosts='leader.mesos:2181', read_only=True)
zk.start()

clusterId = ""
# Here we can give znode path to retrieve its decoded data,
# for ex to get cluster-id, use
# data, stat = zk.get("/cluster-id")
# clusterId = data.decode("utf-8")

# Get cluster Id
if zk.exists("/cluster-id"):
    data, stat = zk.get("/cluster-id")
    clusterId = data.decode("utf-8")

zk.stop()

print (clusterId)

Access dcos cluster data from cluster zookeeper using exhibitor rest API

# Get znode data using endpoint :
# /exhibitor/exhibitor/v1/explorer/node-data?key=/path/to/node
# Example : Get znode data for path = /cluster-id
curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/exhibitor/exhibitor/v1/explorer/node-data?key=/cluster-id

# Get znode data using endpoint :
# /exhibitor/exhibitor/v1/explorer/node-data?key=/path/to/node
# Example : Get znode data for path = /cluster-id
curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/exhibitor/exhibitor/v1/explorer/node-data?key=/cluster-id

Sample Output:

{
"bytes": "3333-XXXXXX",
"str": "abcde-abcde-abcde-abcde-abcde-",
"stat": "XXXXXX"
}

{
"bytes": "3333-XXXXXX",
"str": "abcde-abcde-abcde-abcde-abcde-",
"stat": "XXXXXX"
}

Get cluster name using Mesos API

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/mesos/state-summary | jq .cluster -r

curl -s -H "Authorization: Bearer $(dcos config show core.dcos_acs_token)"
$(dcos config show core.dcos_url)/mesos/state-summary | jq .cluster -r

Mark Mesos node as decommissioned

Some times instances which are running as DC/OS node gets terminated and can not come back online, like AWS EC2 instances, once terminated due to any reason, can not start back. When Mesos detects that a node has stopped, it puts the node in the UNREACHABLE state because Mesos does not know if the node is temporarily stopped and will come back online, or if it is permanently stopped. In such case, we can explicitly tell Mesos to put a node in the GONE state if we know a node will not come back.

dcos node decommission <mesos-agent-id>

dcos node decommission <mesos-agent-id>

Conclusion

We learned about Mesosphere DC/OS, its functionality and roles. We also learned how to setup and use DC/OS cli and use http authentication to access DC/OS APIs as well as using DC/OS cli for automating tasks.

We went through different API endpoints like Mesos, Marathon, DC/OS metrics, exhibitor, DC/OS cluster organization etc. Finally, we looked at different tricks and scripts to automate DC/OS, like DC/OS node details, task exec, Docker report, DC/OS API http authentication etc.

December 12, 2022

Enable Real-time Functionality in Your App with GraphQL and Pusher
The most recognized solution for real-time problems is WebSockets (WS), where there is a persistent connection between the client and the server, and either can start sending data at any time. One of the latest implementations of WS is GraphQL subscriptions.

With GraphQL subscriptions, you can easily add real-time functionalities to your application. There is an easy and standard way to implement a subscription in the GraphQL app. The client just has to make a subscription query to the server, which specifies the event and the data shape. With this query, the client establishes a long-lived connection with the server on which it listens to specific events. Just as how GraphQL solves the over-fetching problem in the REST API, a subscription continues to extend the solution for real-time.

In this post, we will learn how to bring real-time functionality to your app by implementing GraphQL subscriptions with Pusher to manage Pub/Sub capabilities. The goal is to configure a Pusher channel and implement two subscriptions to be exposed by your GraphQL server. We will be implementing this in a Node.js runtime environment.

Why Pusher?

Why are we doing this using Pusher?
- Pusher, being a hosted real-time services provider, relieves us from managing our own real-time infrastructure, which is a highly complex problem.
- Pusher provides an easy and consistent API.
- Pusher also provides an entire set of tools to monitor and debug your realtime events.
- Events can be triggered by and consumed easily from different applications written in different frameworks.
Project Setup

We will start with a repository that contains a codebase for a simple GraphQL backend in Node.js, which is a minimal representation of a blog post application. The entities included are:
1. Link – Represents an URL and a small description for the Link
2. User – Link belongs to User
3. Vote – Represents users vote for a Link
In this application, a User can sign up and add or vote a Link in the application, and other users can upvote the Link. The database schema is built using Prisma and SQLite for quick bootstrapping. In the backend, we will use graphql-yoga as the GraphQL server implementation. To test our GraphQL backend, we will use the graphql-playground by Prisma, as a client, which will perform all queries and mutations on the server.

To set up the application:
1. Clone the repository here
2. Install all dependencies using
```
npm install
```
1. Set up a database using prisma-cli with following commands
```
npx prisma migrate save --experimental
#! Select ‘yes’ for the prompt to add an SQLite db after this command and enter a name for the migration. 
npx prisma migrate up --experimental
npx prisma generate
```
Note: Migrations are experimental features of the Prisma ORM, but you can ignore them because you can have a different backend setup for DB interactions. The purpose of using Prisma here is to quickly set up the project and dive into subscriptions.

A new directory, named Prisma, will be created containing the schema and database in SQLite. Now, you have your database and app set up and ready to use.

To start the Node.js application, execute the command:
```
npm start
```
Navigate to http://localhost:4000 to see the graphql-playground where we will execute our queries and mutations.

Our next task is to add a GraphQL subscription to our server to allow clients to listen to the following events:
- A new Link is created
- A Link is upvoted
To add subscriptions, we will need an npm package called graphql-pusher-subscriptions to help us interact with the Pusher service from within the GraphQL resolvers. The module will trigger events and listen to events for a channel from the Pusher service.

Before that, let’s first create a channel in Pusher. To configure a Pusher channel, head to their website at Pusher to create an account. Then, go to your dashboard and create a channels application. Choose a name, the cluster closest to your location, and frontend tech as React and backend tech as Node.js.

You will receive the following code to start.

Now, we add the graphql-pusher-subscription package. This package will take the Pusher channel configuration and give you an API to trigger and listen to events published on the channel.

Now, we import the package in the src/index.js file.
```
const { PusherChannel } = require('graphql-pusher-subscriptions');
```
After the PusherChannel class provided by the module accepts a configuration for the channel, we need to instantiate the class and create a reference Pub/Sub to the object. We give the Pusher config object given while creating the channel.
```
const pubsub = new PusherChannel({
  appId: '1046878',
  key: '3c84229419ed7b47e5b0',
  secret: 'e86868a98a2f052981a6',
  cluster: 'ap2',
  encrypted: true,
  channel: 'graphql-subscription'
});
```
Now, we add “pubsub” to the context so that it is available to all the resolvers. The channel field tells the client which channel to subscribe to. Here we have the channel “graphql-subscription”.
```
const server = new GraphQLServer({
  typeDefs: './src/schema.graphql',
  resolvers,
  context: request => {
    return {
      ...request,
      prisma,
      pubsub
    }
  },
})
```
The above part enables us to access the methods we need to implement our subscriptions from inside our resolvers via context.pubsub.

Subscribing to Link-created Event

The first step to add a subscription is to extend the GraphQL schema definition.
```
type Subscription {
  newLink: Link
}
```
Next, we implement the resolver for the “newLink” subscription type field. It is important to note that resolvers for subscriptions are different from queries and mutations in minor ways.

1. They return an AsyncIterator instead of data, which is then used by a GraphQL server to publish the event payload to the subscriber client.

2. The subscription resolvers are provided as a value of the resolve field inside an object. The object should also contain another field named “resolve” that returns the payload data from the data emitted by AsyncIterator.

To add the resolvers for the subscription, we start by adding a new file called Subscriptions.js

Inside the project directory, add the file as src/resolvers/Subscription.js

Now, in the new file created, add the following code, which will be the subscription resolver for the “newLink” type we created in GraphQL schema.
```
function newLinkSubscribe(parent, args, context, info) {
  return context.pubsub.asyncIterator("NEW_LINK")
}

const newLink = {
  subscribe: newLinkSubscribe,
  resolve: payload => {
    return payload
  },
}

module.exports = {
  newLink,
}
view raw
```
In the code above, the subscription resolver function, newLinkSubscribe, is added as a field value to the property subscribe just as we described before. The context provides reference to the Pub/Sub object, which lets us use the asyncIterator() with “NEW_LINK” as a parameter. This function resolves subscriptions and publishes events.

Adding Subscriptions to Your Resolvers

The final step for our subscription implementation is to call the function above inside of a resolver. We add the following call to pubsub.publish() inside the post resolver function inside Mutation.js file.
```
function post(parent, args, context, info) {
  const userId = getUserId(context)
  const newLink = await context.prisma.link.create({
    data: {
      url: args.url,
      description: args.description,
      postedBy: { connect: { id: userId } },
    }
  })
  context.pubsub.publish("NEW_LINK", newLink)
  return newLink
}
```
In the code above, we can see that we pass the same string “NEW_LINK” to the publish method as we did in the newLinkSubscribe function in the subscription function before. The “NEW_LINK” is the event name, and it will publish events to the Pusher service, and the same name will be used on the subscription resolver to bind to the particular event name. We also add the newLink as a second argument, which contains the data part for the event that will be published. The context.pubsub.publish function will be triggered before returning the newLink data.

Now, we will update the main resolver object, which is given to the GraphQL server.

First, import the subscription module inside of the index.js file.
```
const Subscription = require('./resolvers/Subscription') 
const resolvers = {
  Query,
  Mutation,
  Subscription,
  User,
  Link,
}
```
Now, with all code in place, we start testing our real time API. We will use multiple instances/tabs of GraphQL playground concurrently.

Testing Subscriptions

If your server is already running, then kill it with CTRL+C and restart with this command:
```
npm start
```
Next, open the browser and navigate to http://localhost:4000 to see the GraphQL playground. We will use one tab of the playground to perform the mutation to trigger the event to Pusher and invoke the subscriber.

We will now start to execute the queries to add some entities in the application.

First, let’s create a user in the application by using the signup mutation. We send the following mutation to the server to create a new User entity.
```
mutation {
    signup(
    name: "Alice"
    email: "alice@prisma.io"
    password: "graphql"
  ) {
    token
    user {
      Id
    }
  }
}
```
You will see a response in the playground that contains the authentication token for the user. Copy the token, and open another tab in the playground. Inside that new tab, open the HTTP_HEADERS section in the bottom and add the Authorization header.

Replace the __TOKEN__ placeholder from the below snippet with the copied token from above.
```
{
  "Authorization": "Bearer __TOKEN__"
}
```
Now, all the queries or mutations executed from that tab will carry the authentication token. With this in place, we sent the following mutation to our GraphQL server.
```
mutation {
post(
    url: "http://velotio.com"
    description: "An awesome GraphQL blog"
  ) {
    id
  }
}
```
The mutations above create a Link entity inside the application. Now that we have created an entity, we now move to test the subscription part. In another tab, we will send the subscription query and create a persistent WebSocket connection to the server. Before firing out a subscription query, let us first understand the syntax of it. It starts with the keyword subscription followed by the subscription name. The subscription query is defined in the GraphQL schema and shows the data shape we can resolve to. Here, we want to subscribe to a newLink subscription name, and the data resolved by it consists of that of a Link entity. That means we can resolve any specific part of the Link entity. Here, we are asking for attributes like id, URL, description, and nested attributes of the postedBy field.
```
subscription {
  newLink {
      id
      url
      description
      postedBy {
        id
        name
        email
      }
  }
}
```
The response of this operation is different from that of a mutation or query. You see a loading spinner, which indicates that it is waiting for an event to happen. This means the GraphQL client (playground) has established a connection with the server and is listening for response data.

Before triggering a subscription, we will also keep an eye on the Pusher channel for events triggered to verify that our Pusher service is integrated successfully.

To do this, we go to Pusher dasboard and navigate to the channel app we created and click on the debug console. The debug console will show us the events triggered in real-time.

Now that the Pusher dashboard is visible, we will trigger the subscription event by running the following mutation inside a new Playground tab.
```
mutation {
  post(
    url: "www.velotio.com"
    description: "Graphql remote schema stitching"
  ) {
    id
  }
}
```
Now, we observe the Playground where subscription was running.

We can see that the newly created Link is visible in the response section, and the subscription continues to listen, and the event has reached the Pusher service.

You will observe an event on the Pusher console that is the same event and data as sent by your post mutation.

We have achieved our first goal, i.e., we have integrated the Pusher channel and implemented a subscription for a Link creation event.

To achieve our second goal, i.e., to listen to Vote events, we repeat the same steps as we did for the Link subscription.

We add a subscription resolver for Vote in the Subscriptions.js file and update the Subscription type in the GraphQL schema. To trigger a different event, we use “NEW_VOTE” as the event name and add the publish function inside the resolver for Vote mutation.
```
function newVoteSubscribe(parent, args, context, info) {
  return context.pubsub.asyncIterator("NEW_VOTE")
}

const newVote = {
  subscribe: newVoteSubscribe,
  resolve: payload => {
    return payload
  },
}
view raw
```
Update the export statement to add the newVote resolver.
```
module.exports = {
  newLink,
  newVote,
}
```
Update the Vote mutation to add the publish call before returning the newVote data. Notice that the first parameter, “NEW_VOTE”, is being passed so that the listener can bind to the new event with that name.
```
const newVote = context.prisma.vote.create({
    data: {
      user: { connect: { id: userId } },
      link: { connect: { id: Number(args.linkId) } },
    }
  })
  context.pubsub.publish("NEW_VOTE", newVote)
  return newVote
}
```
Now, restart the server and complete the signup process with setting HTTP_HEADERS as we did before. Add the following subscription to a new Playground tab.
```
subscription {
  newVote {
    id
    link {
      url
      description
    }
    user {
      name
      email
    }
  }
}
```
In another Playground tab, send the following Vote mutation to the server to trigger the event, but do not forget to verify the Authorization header. The below mutation will add the Vote of the user to the Link. Replace the “__LINK_ID__” with the linkId generated in the previous post mutation.
```
mutation {
  vote(linkId: "__LINK_ID__") {
    link {
      url
      description
    }
    user {
      name
      email
    }
  }
}
```
Observe the event data on the response tab of the vote subscription. Also, you can check your event triggered on the pusher dashboard.

The final codebase is available on a branch named with-subscription.

Conclusion

By following the steps above, we saw how easy it is to add real-time features to GraphQL apps with subscriptions. Also, establishing a connection with the server is no hassle, and it is much similar to how we implement the queries and mutations. Unlike the mainstream approach, where one has to build and manage the event handlers, the GraphQL subscriptions come with these features built-in for the client and server. Also, we saw how we can use a managed real-time service like Pusher can be for Pub/Sub events. Both GraphQL and Pusher can prove to be a solid combination for a reliable real-time system.

Related Articles

1. Build and Deploy a Real-Time React App Using AWS Amplify and GraphQL

2. Scalable Real-time Communication With Pusher
December 12, 2022
Your Complete Guide to Building Stateless Bots Using Rasa Stack
This blog aims at exploring the Rasa Stack to create a stateless chat-bot. We will look into how, the recently released Rasa Core, which provides machine learning based dialogue management, helps in maintaining the context of conversations using machine learning in an efficient way.

If you have developed chatbots, you would know how hopelessly bots fail in maintaining the context once complex use-cases need to be developed. There are some home-grown approaches that people currently use to build stateful bots. The most naive approach is to create the state machines where you create different states and based on some logic take actions. As the number of states increases, more levels of nested logic are required or there is a need to add an extra state to the state machine, with another set of rules for how to get in and out of that state. Both of these approaches lead to fragile code that is harder to maintain and update. Anyone who’s built and debugged a moderately complex bot knows this pain.

After building many chatbots, we have experienced that flowcharts are useful for doing the initial design of a bot and describing a few of the known conversation paths, but we shouldn’t hard-code a bunch of rules since this approach doesn’t scale beyond simple conversations.

Thanks to the Rasa guys who provided a way to go stateless where scaling is not at all a problem. Let’s build a bot using Rasa Core and learn more about this.

Rasa Core: Getting Rid of State Machines

The main idea behind Rasa Core is that thinking of conversations as a flowchart and implementing them as a state machine doesn’t scale. It’s very hard to reason about all possible conversations explicitly, but it’s very easy to tell, mid-conversation, if a response is right or wrong. For example, let’s consider a term insurance purchase bot, where you have defined different states to take different actions. Below diagram shows an example state machine:

Let’s consider a sample conversation where a user wants to compare two policies listed by policy_search state.

In above conversation, it can be compared very easily by adding some logic around the intent campare_policies. But real life is not so easy, as a majority of conversations are edge cases. We need to add rules manually to handle such cases, and after testing we realize that these clash with other rules we wrote earlier.

Rasa guys figured out how machine learning can be used to solve this problem. They have released Rasa Core where the logic of the bot is based on a probabilistic model trained on real conversations.

Structure of a Rasa Core App

Let’s understand few terminologies we need to know to build a Rasa Core app:

1. Interpreter: An interpreter is responsible for parsing messages. It performs the Natural Language Understanding and transforms the message into structured output i.e. intent and entities. In this blog, we are using Rasa NLU model as an interpreter. Rasa NLU comes under the Rasa Stack. In Training section, it is shown in detail how to prepare the training data and create a model.

2. Domain: To define a domain we create a domain.yml file, which defines the universe of your bot. Following things need to be defined in a domain file:
- Intents: Things we expect the user to say. It is more related to Rasa NLU.
- Entities: These represent pieces of information extracted what user said. It is also related to Rasa NLU.
- Templates: We define some template strings which our bot can say. The format for defining a template string is utter_<intent>. These are considered as actions which bot can take.
- Actions: List of things bot can do and say. There are two types of actions we define one those which will only utter message (Templates) and others some customised actions where some required logic is defined. Customised actions are defined as Python classes and are referenced in domain file.
- Slots: These are user-defined variables which need to be tracked in a conversation. For e.g to buy a term insurance we need to keep track of what policy user selects and details of the user, so all these details will come under slots.
3. Stories: In stories, we define what bot needs to do at what point in time. Based on these stories, a probabilistic model is generated which is used to decide which action to be taken next. There are two ways in which stories can be created which are explained in next section.

Let’s combine all these pieces together. When a message arrives in a Rasa Core app initially, interpreter transforms the message into structured output i.e. intents and entities. The Tracker is the object which keeps track of conversation state. It receives the info that a new message has come in. Then based on dialog model we generate using domain and stories policy chooses which action to take next. The chosen action is logged by the tracker and response is sent back to the user.

Training and Running A Sample Bot

We will create a simple Facebook chat-bot named Secure Life which assists you in buying term life insurance. To keep the example simple, we have restricted options such as age-group, term insurance amount, etc.

There are two models we need to train in the Rasa Core app:

Rasa NLU model based on which messages will be processed and converted to a structured form of intent and entities. Create following two files to generate the model:

data.json: Create this training file using the rasa-nlu trainer. Click here to know more about the rasa-nlu trainer.

nlu_config.json: This is the configuration file.
```
{
"pipeline": "spacy_sklearn",
"path" : "./models",
"project": "nlu",
"data" : "./data/data.md"
}
```
Run below command to train the rasa-nlu model:-
```
$ python -m rasa_nlu.train -c nlu_model_config.json --fixed_model_name current
```
Dialogue Model: This model is trained on stories we define, based on which the policy will take the action. There are two ways in which stories can be generated:
- Supervised Learning: In this type of learning we will create the stories by hand, writing them directly in a file. It is easy to write but in case of complex use-cases it is difficult to cover all scenarios.
- Reinforcement Learning: The user provides feedback on every decision taken by the policy. This is also known as interactive learning. This helps in including edge cases which are difficult to create by hand. You must be thinking how it works? Every time when a policy chooses an action to take, it is asked from the user whether the chosen action is correct or not. If the action taken is wrong, you can correct the action on the fly and store the stories to train the model again.
Since the example is simple, we have used supervised learning method, to generate the dialogue model. Below is the stories.md file.
## All yes * greet - utter_greet * affirm - utter_very_much_so * affirm - utter_gender * gender - utter_coverage_duration - action_gender * affirm - utter_nicotine * affirm - action_nicotine * age - action_thanks ## User not interested * greet - utter_greet * deny - utter_decline ## Coverage duration is not sufficient * greet - utter_greet * affirm - utter_very_much_so * affirm - utter_gender * gender - utter_coverage_duration - action_gender * deny - utter_decline
```
## All yes
* greet
- utter_greet
* affirm
- utter_very_much_so
* affirm
- utter_gender
* gender
- utter_coverage_duration
- action_gender
* affirm
- utter_nicotine
* affirm
- action_nicotine
* age
- action_thanks

## User not interested
* greet
- utter_greet
* deny
- utter_decline

## Coverage duration is not sufficient
* greet
- utter_greet
* affirm
- utter_very_much_so
* affirm
- utter_gender
* gender
- utter_coverage_duration
- action_gender
* deny
- utter_decline
```
Run below command to train dialogue model :
```
$ python -m rasa_core.train -s <path to stories.md file> -d <path to domain.yml> -o models/dialogue --epochs 300
```
Define a Domain: Create domain.yml file containing all the required information. Among the intents and entities write all those strings which bot is supposed to see when user say something i.e. intents and entities you defined in rasa NLU training file.
intents: - greet - goodbye - affirm - deny - age - gender slots: gender: type: text nicotine: type: text agegroup: type: text templates: utter_greet: - "hey there! welcome to Secure-Life!\nI can help you quickly estimate your rate of coverage.\nWould you like to do that ?" utter_very_much_so: - "Great! Let's get started.\nWe currently offer term plans of Rs. 1Cr. Does that suit your need?" utter_gender: - "What gender do you go by ?" utter_coverage_duration: - "We offer this term plan for a duration of 30Y. Do you think that's enough to cover entire timeframe of your financial obligations ?" utter_nicotine: - "Do you consume nicotine-containing products?" utter_age: - "And lastly, how old are you ?" utter_thanks: - "Thank you for providing all the info. Let me calculate the insurance premium based on your inputs." utter_decline: - "Sad to see you go. In case you change your plans, you know where to find me :-)" utter_goodbye: - "goodbye :(" actions: - utter_greet - utter_goodbye - utter_very_much_so - utter_coverage_duration - utter_age - utter_nicotine - utter_gender - utter_decline - utter_thanks - actions.ActionGender - actions.ActionNicotine - actions.ActionThanks
```
intents:
- greet
- goodbye
- affirm
- deny
- age
- gender

slots:
gender:
type: text
nicotine:
type: text
agegroup:
type: text

templates:
utter_greet:
- "hey there! welcome to Secure-Life!\nI can help you quickly estimate your rate of coverage.\nWould you like to do that ?"

utter_very_much_so:
- "Great! Let's get started.\nWe currently offer term plans of Rs. 1Cr. Does that suit your need?"

utter_gender:
- "What gender do you go by ?"

utter_coverage_duration:
- "We offer this term plan for a duration of 30Y. Do you think that's enough to cover entire timeframe of your financial obligations ?"

utter_nicotine:
- "Do you consume nicotine-containing products?"

utter_age:
- "And lastly, how old are you ?"

utter_thanks:
- "Thank you for providing all the info. Let me calculate the insurance premium based on your inputs."

utter_decline:
- "Sad to see you go. In case you change your plans, you know where to find me :-)"

utter_goodbye:
- "goodbye :("

actions:
- utter_greet
- utter_goodbye
- utter_very_much_so
- utter_coverage_duration
- utter_age
- utter_nicotine
- utter_gender
- utter_decline
- utter_thanks
- actions.ActionGender
- actions.ActionNicotine
- actions.ActionThanks
```
Define Actions: Templates defined in domain.yml also considered as actions. A sample customized action is shown below where we are setting a slot named gender with values according to the option selected by the user.
from rasa_core.actions.action import Action from rasa_core.events import SlotSet class ActionGender(Action): def name(self): return 'action_gender' def run(self, dispatcher, tracker, domain): messageObtained = tracker.latest_message.text.lower() if ("male" in messageObtained): return [SlotSet("gender", "male")] elif ("female" in messageObtained): return [SlotSet("gender", "female")] else: return [SlotSet("gender", "others")]
```
from rasa_core.actions.action import Action
from rasa_core.events import SlotSet

class ActionGender(Action):
def name(self):
return 'action_gender'
def run(self, dispatcher, tracker, domain):
messageObtained = tracker.latest_message.text.lower()

if ("male" in messageObtained):
return [SlotSet("gender", "male")]
elif ("female" in messageObtained):
return [SlotSet("gender", "female")]
else:
return [SlotSet("gender", "others")]
```
Running the Bot

Create a Facebook app and get the app credentials. Create a bot.py file as shown below:
from rasa_core import utils from rasa_core.agent import Agent from rasa_core.interpreter import RasaNLUInterpreter from rasa_core.channels import HttpInputChannel from rasa_core.channels.facebook import FacebookInput logger = logging.getLogger(__name__) def run(serve_forever=True): # create rasa NLU interpreter interpreter = RasaNLUInterpreter("models/nlu/current") agent = Agent.load("models/dialogue", interpreter=interpreter) input_channel = FacebookInput( fb_verify="your_fb_verify_token", # you need tell facebook this token, to confirm your URL fb_secret="your_app_secret", # your app secret fb_tokens={"your_page_id": "your_page_token"}, # page ids + tokens you subscribed to debug_mode=True # enable debug mode for underlying fb library ) if serve_forever: agent.handle_channel(HttpInputChannel(5004, "/app", input_channel)) return agent if __name__ == '__main__': utils.configure_colored_logging(loglevel="DEBUG") run()
```
from rasa_core import utils
from rasa_core.agent import Agent
from rasa_core.interpreter import RasaNLUInterpreter
from rasa_core.channels import HttpInputChannel
from rasa_core.channels.facebook import FacebookInput

logger = logging.getLogger(__name__)

def run(serve_forever=True):
# create rasa NLU interpreter
interpreter = RasaNLUInterpreter("models/nlu/current")
agent = Agent.load("models/dialogue", interpreter=interpreter)

input_channel = FacebookInput(
fb_verify="your_fb_verify_token", # you need tell facebook this token, to confirm your URL
fb_secret="your_app_secret", # your app secret
fb_tokens={"your_page_id": "your_page_token"}, # page ids + tokens you subscribed to
debug_mode=True # enable debug mode for underlying fb library
)

if serve_forever:
agent.handle_channel(HttpInputChannel(5004, "/app", input_channel))
return agent

if __name__ == '__main__':
utils.configure_colored_logging(loglevel="DEBUG")
run()
```
Run the file and your bot is ready to test. Sample conversations are provided below:

Summary

You have seen how Rasa Core has made it easier to build bots. Just create few files and boom! Your bot is ready! Isn’t it exciting? I hope this blog provided you some insights on how Rasa Core works. Start exploring and let us know if you need any help in building chatbots using Rasa Core.
December 12, 2022
An eminent healthcare provider modernized its entire appointment scheduling process with R Systems’ AI-enabled solution

The AI-enabled automated appointment scheduling solution offered by R Systems ensured improved CSAT score and reduced OpEx

July 25, 2019
A renowned healthcare provider achieved process excellence with 80% cost reduction with R Systems’ AI/Machine Learning services

R Systems helped the client automate its billing process to reduce manual efforts and enhance operational efficiency

July 25, 2019
A Leading Telecom Operator increased savings to the tune of $4.2 million annually, by minimizing Custom Churn with R Systems Analytics Gym

Perform in-depth analysis to help the client discover their process challenges. E.g. Ongoing unresolved technical issues need for competitive pricing and improvements in their overall customer service operations

April 4, 2019
A Global Healthcare Provider Substantially Improved Compliance With R Systems Analytics

Our team of experts created rules to help supervisors view, compare and find out ambiguities that triggered non-compliance.

April 4, 2019
A Leading Internet Service Provider Improved FCR Rate by Up to 5% with R Systems Analytics

R Systems deployed its integrated interaction analytics platform along with a team of data scientists and domain experts to help the client detect the root causes of poor FCR performance and to reduce the repeat call volume.

April 4, 2019
A Global Telecom Operator Increased Customer Retention Rate by 3% With R Systems Analytics

Evaluated customer interactions prior to the customers’ disconnect to understand churn behaviour patterns and key triggers leading to cancellations of services

April 4, 2019
A US-based Healthcare Service Provider Significantly Reduced Member Effort with R Systems Analytics

Using R Systems’ Analytics Gym Data Science engagement with actionable insights and predictions, the client experienced a substantial reduction in its Member Effort Score.

April 4, 2019

Category: Services

Installing and using DC/OS

DC/OS commands and scripts

Setup DC/OS cli with DC/OS cluster

DC/OS authentication token

DC/OS cluster url

DC/OS cluster name

Access Mesos UI

Access Marathon UI

Access any DC/OS service, like Marathon, Kafka, Elastic, Spark etc.[DC/OS Services]

Access DC/OS slaves info in json using Mesos API [Mesos Endpoints]

Access DC/OS slaves info in json using DC/OS cli

Access DC/OS private slaves info using DC/OS cli

Access DC/OS public slaves info using DC/OS cli

Access DC/OS private and public slaves info using DC/OS cli

Get public IP of all public agents

Get public IP of master leader

Get all master nodes and their private ip

Get list of all users who have access to DC/OS cluster

Add users to cluster using Mesosphere script (Run this on master)

Add users to cluster using DC/OS API

Delete users from DC/OS cluster organization

Offers/resources from individual DC/OS agent

Save JSON configs of all running Marathon apps

Get report of Marathon apps with details like container type, Docker image, tag or service version used by Marathon app.

Get DC/OS nodes with more information like node type, node ip, attributes, number of running tasks, free memory, free cpu etc.

Framework Cleaner

Get DC/OS apps and their placement constraints

Run shell command on all slaves

Run shell command on master leader

Run shell command on all master nodes

Add node attributes to dcos nodes and run apps on nodes with required attributes using placement constraints

Install DC/OS Datadog metrics plugin on all DC/OS nodes

Get app / node metrics fetched by dcos-metrics component using metrics API

Get app / node metrics fetched by dcos-metrics component using dcos cli

Launch / run command inside container for a task

Get DC/OS tasks by node

Get cluster metadata – cluster Public IP and cluster ID

Get DC/OS metadata – DC/OS version

Get Mesos version

Access DC/OS cluster exhibitor UI (Exhibitor supervises ZooKeeper and provides a management web interface)

Access DC/OS cluster data from cluster zookeeper using Zookeeper Python client – Run inside any node / container

Access dcos cluster data from cluster zookeeper using exhibitor rest API

Get cluster name using Mesos API

Mark Mesos node as decommissioned

Conclusion

Why Pusher?

Project Setup

Subscribing to Link-created Event

Adding Subscriptions to Your Resolvers

Testing Subscriptions

Conclusion

Related Articles

Rasa Core: Getting Rid of State Machines

Structure of a Rasa Core App

Training and Running A Sample Bot

Running the Bot

Summary