Tag: deep learning

Exploring OpenAI Gym: A Platform for Reinforcement Learning Algorithms

Introduction

According to the OpenAI Gym GitHub repository “OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. This is the gym open-source library, which gives you access to a standardized set of environments.”

Open AI Gym has an environment-agent arrangement. It simply means Gym gives you access to an “agent” which can perform specific actions in an “environment”. In return, it gets the observation and reward as a consequence of performing a particular action in the environment.

There are four values that are returned by the environment for every “step” taken by the agent.

Observation (object): an environment-specific object representing your observation of the environment. For example, board state in a board game etc
Reward (float): the amount of reward/score achieved by the previous action. The scale varies between environments, but the goal is always to increase your total reward/score.
Done (boolean): whether it’s time to reset the environment again. E.g you lost your last life in the game.
Info (dict): diagnostic information useful for debugging. However, official evaluations of your agent are not allowed to use this for learning.

Following are the available Environments in the Gym:

Classic control and toy text
Algorithmic
Atari
2D and 3D robots

Here you can find a full list of environments.

Cart-Pole Problem

Here we will try to write a solve a classic control problem from Reinforcement Learning literature, “The Cart-pole Problem”.

The Cart-pole problem is defined as follows:
“A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center.”

The following code will quickly allow you see how the problem looks like on your computer.

import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
    env.render()
    env.step(env.action_space.sample())

import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
    env.render()
    env.step(env.action_space.sample())

This is what the output will look like:

Coding the neural network

#We first import the necessary libraries and define hyperparameters - 
import gym
import random
import numpy as np
import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression
from statistics import median, mean
from collections import Counter
LR = 2.33e-4
env = gym.make("CartPole-v0")
observation = env.reset()
goal_steps = 500
score_requirement = 50
initial_games = 10000
#Now we will define a function to generate training data - 
def initial_population():
    # [OBS, MOVES]
    training_data = []
    # all scores:
    scores = []
    # scores above our threshold:
    accepted_scores = []
    # number of episodes
    for _ in range(initial_games):
        score = 0
        # moves specifically from this episode:
        episode_memory = []
        # previous observation that we saw
        prev_observation = []
        for _ in range(goal_steps):
            # choose random action left or right i.e (0 or 1)
            action = random.randrange(0,2)
            observation, reward, done, info = env.step(action)
            # since that the observation is returned FROM the action
            # we store previous observation and corresponding action
            if len(prev_observation) > 0 :
                episode_memory.append([prev_observation, action])
            prev_observation = observation
            score+=reward
            if done: break
        # reinforcement methodology here.
        # IF our score is higher than our threshold, we save
        # all we're doing is reinforcing the score, we're not trying
        # to influence the machine in any way as to HOW that score is
        # reached.
        if score >= score_requirement:
            accepted_scores.append(score)
            for data in episode_memory:
                # convert to one-hot (this is the output layer for our neural network)
                if data[1] == 1:
                    output = [0,1]
                elif data[1] == 0:
                    output = [1,0]
                # saving our training data
                training_data.append([data[0], output])
        # reset env to play again
        env.reset()
        # save overall scores
        scores.append(score)
# Now using tflearn we will define our neural network 
def neural_network_model(input_size):
    network = input_data(shape=[None, input_size, 1], name='input')
    network = fully_connected(network, 128, activation='relu')
    network = dropout(network, 0.8)
    network = fully_connected(network, 256, activation='relu')
    network = dropout(network, 0.8)
    network = fully_connected(network, 512, activation='relu')
    network = dropout(network, 0.8)
    network = fully_connected(network, 256, activation='relu')
    network = dropout(network, 0.8)
    network = fully_connected(network, 128, activation='relu')
    network = dropout(network, 0.8)
    network = fully_connected(network, 2, activation='softmax')
    network = regression(network, optimizer='adam', learning_rate=LR, loss='categorical_crossentropy', name='targets')
    model = tflearn.DNN(network, tensorboard_dir='log')
    return model
#It is time to train the model now -
def train_model(training_data, model=False):
    X = np.array([i[0] for i in training_data]).reshape(-1,len(training_data[0][0]),1)
    y = [i[1] for i in training_data]
    if not model:
        model = neural_network_model(input_size = len(X[0]))
    model.fit({'input': X}, {'targets': y}, n_epoch=5, snapshot_step=500, show_metric=True, run_id='openai_CartPole')
    return model
training_data = initial_population()
model = train_model(training_data)
#Training complete, now we should play the game to see how the output looks like 
scores = []
choices = []
for each_game in range(10):
    score = 0
    game_memory = []
    prev_obs = []
    env.reset()
    for _ in range(goal_steps):
        env.render()
        if len(prev_obs)==0:
            action = random.randrange(0,2)
        else:
            action = np.argmax(model.predict(prev_obs.reshape(-1,len(prev_obs),1))[0])
        choices.append(action)
        new_observation, reward, done, info = env.step(action)
        prev_obs = new_observation
        game_memory.append([new_observation, action])
        score+=reward
        if done: break
    scores.append(score)
print('Average Score:',sum(scores)/len(scores))
print('choice 1:{}  choice 0:{}'.format(float((choices.count(1))/float(len(choices)))*100,float((choices.count(0))/float(len(choices)))*100))
print(score_requirement)

#We first import the necessary libraries and define hyperparameters - 

import gym
import random
import numpy as np
import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression
from statistics import median, mean
from collections import Counter

LR = 2.33e-4
env = gym.make("CartPole-v0")
observation = env.reset()
goal_steps = 500
score_requirement = 50
initial_games = 10000

#Now we will define a function to generate training data - 

def initial_population():
    # [OBS, MOVES]
    training_data = []
    # all scores:
    scores = []
    # scores above our threshold:
    accepted_scores = []
    # number of episodes
    for _ in range(initial_games):
        score = 0
        # moves specifically from this episode:
        episode_memory = []
        # previous observation that we saw
        prev_observation = []
        for _ in range(goal_steps):
            # choose random action left or right i.e (0 or 1)
            action = random.randrange(0,2)
            observation, reward, done, info = env.step(action)
            # since that the observation is returned FROM the action
            # we store previous observation and corresponding action
            if len(prev_observation) > 0 :
                episode_memory.append([prev_observation, action])
            prev_observation = observation
            score+=reward
            if done: break

        # reinforcement methodology here.
        # IF our score is higher than our threshold, we save
        # all we're doing is reinforcing the score, we're not trying
        # to influence the machine in any way as to HOW that score is
        # reached.
        if score >= score_requirement:
            accepted_scores.append(score)
            for data in episode_memory:
                # convert to one-hot (this is the output layer for our neural network)
                if data[1] == 1:
                    output = [0,1]
                elif data[1] == 0:
                    output = [1,0]

                # saving our training data
                training_data.append([data[0], output])

        # reset env to play again
        env.reset()
        # save overall scores
        scores.append(score)

# Now using tflearn we will define our neural network 

def neural_network_model(input_size):

    network = input_data(shape=[None, input_size, 1], name='input')

    network = fully_connected(network, 128, activation='relu')
    network = dropout(network, 0.8)

    network = fully_connected(network, 256, activation='relu')
    network = dropout(network, 0.8)

    network = fully_connected(network, 512, activation='relu')
    network = dropout(network, 0.8)

    network = fully_connected(network, 256, activation='relu')
    network = dropout(network, 0.8)

    network = fully_connected(network, 128, activation='relu')
    network = dropout(network, 0.8)

    network = fully_connected(network, 2, activation='softmax')
    network = regression(network, optimizer='adam', learning_rate=LR, loss='categorical_crossentropy', name='targets')
    model = tflearn.DNN(network, tensorboard_dir='log')

    return model

#It is time to train the model now -

def train_model(training_data, model=False):

    X = np.array([i[0] for i in training_data]).reshape(-1,len(training_data[0][0]),1)
    y = [i[1] for i in training_data]

    if not model:
        model = neural_network_model(input_size = len(X[0]))

    model.fit({'input': X}, {'targets': y}, n_epoch=5, snapshot_step=500, show_metric=True, run_id='openai_CartPole')
    return model

training_data = initial_population()

model = train_model(training_data)

#Training complete, now we should play the game to see how the output looks like 

scores = []
choices = []
for each_game in range(10):
    score = 0
    game_memory = []
    prev_obs = []
    env.reset()
    for _ in range(goal_steps):
        env.render()

        if len(prev_obs)==0:
            action = random.randrange(0,2)
        else:
            action = np.argmax(model.predict(prev_obs.reshape(-1,len(prev_obs),1))[0])

        choices.append(action)

        new_observation, reward, done, info = env.step(action)
        prev_obs = new_observation
        game_memory.append([new_observation, action])
        score+=reward
        if done: break

    scores.append(score)

print('Average Score:',sum(scores)/len(scores))
print('choice 1:{}  choice 0:{}'.format(float((choices.count(1))/float(len(choices)))*100,float((choices.count(0))/float(len(choices)))*100))
print(score_requirement)

This is what the result will look like:

Conclusion

Though we haven’t used the Reinforcement Learning model in this blog, the normal fully connected neural network gave us a satisfactory accuracy of 60%. We used tflearn, which is a higher level API on top of Tensorflow for speeding-up experimentation. We hope that this blog will give you a head start in using OpenAI Gym.

We are waiting to see exciting implementations using Gym and Reinforcement Learning. Happy Coding!

December 12, 2022

A Step Towards Machine Learning Algorithms: Univariate Linear Regression
These days the concept of Machine Learning is evolving rapidly. The understanding of it is so vast and open that everyone is having their independent thoughts about it. Here I am putting mine. This blog is my experience with the learning algorithms. In this blog, we will get to know the basic difference between Artificial Intelligence, Machine Learning, and Deep Learning. We will also get to know the foundation Machine Learning Algorithm i.e Univariate Linear Regression.

Intermediate knowledge of Python and its library (Numpy, Pandas, MatPlotLib) is good to start. For Mathematics, a little knowledge of Algebra, Calculus and Graph Theory will help to understand the trick of the algorithm.

A way to Artificial intelligence, Machine Learning, and Deep Learning

These are the three buzzwords of today’s Internet world where we are seeing the future of the programming language. Specifically, we can say that this is the place where science domain meets with programming. Here we use scientific concepts and mathematics with a programming language to simulate the decision-making process. Artificial Intelligence is a program or the ability of a machine to make decisions more as humans do. Machine Learning is another program that supports Artificial Intelligence. It helps the machine to observe the pattern and learn from it to make a decision. Here programming is helping in observing the patterns not in making decisions. Machine learning requires more and more information from various sources to observe all of the variables for any given pattern to make more accurate decisions. Here deep learning is supporting machine learning by creating a network (neural network) to fetch all required information and provide it to machine learning algorithms.

What is Machine Learning

Definition: Machine Learning provides machines with the ability to learn autonomously based on experiences, observations and analyzing patterns within a given data set without explicitly programming.

This is a two-part process. In the first part, it observes and analyses the patterns of given data and makes a shrewd guess of a mathematical function that will be very close to the pattern. There are various methods for this. Few of them are Linear, Non-Linear, logistic, etc. Here we calculate the error function using the guessed mathematical function and the given data. In the second part we will minimize the error function. This minimized function is used for the prediction of the pattern.

Here are the general steps to understand the process of Machine Learning:
1. Plot the given dataset on x-y axis
2. By looking into the graph, we will guess more close mathematical function
3. Derive the Error function with the given dataset and guessed mathematical function
4. Try to minimize an error function by using some algorithms
5. Minimized error function will give us a more accurate mathematical function for the given patterns.
Getting Started with the First Algorithms: Linear Regression with Univariable

Linear Regression is a very basic algorithm or we can say the first and foundation algorithm to understand the concept of ML. We will try to understand this with an example of given data of prices of plots for a given area. This example will help us understand it better.
movieID title userID rating timestamp 0 1 Toy story 170 3.0 1162208198000 1 1 Toy story 175 4.0 1133674606000 2 1 Toy story 190 4.5 1057778398000 3 1 Toy story 267 2.5 1084284499000 4 1 Toy story 325 4.0 1134939391000 5 1 Toy story 493 3.5 1217711355000 6 1 Toy story 533 5.0 1050012402000 7 1 Toy story 545 4.0 1162333326000 8 1 Toy story 580 5.0 1162374884000 9 1 Toy story 622 4.0 1215485147000 10 1 Toy story 788 4.0 1188553740000
```
movieID	title	userID	rating	timestamp
0	1	Toy story	170	3.0	1162208198000
1	1	Toy story	175	4.0	1133674606000
2	1	Toy story	190	4.5	1057778398000
3	1	Toy story	267	2.5	1084284499000
4	1	Toy story	325	4.0	1134939391000
5	1	Toy story	493	3.5	1217711355000
6	1	Toy story	533	5.0	1050012402000
7	1	Toy story	545	4.0	1162333326000
8	1	Toy story	580	5.0	1162374884000
9	1	Toy story	622	4.0	1215485147000
10	1	Toy story	788	4.0	1188553740000
```
With this data, we can easily determine the price of plots of the given area. But what if we want the price of the plot with area 5.0 * 10 sq mtr. There is no direct price of this in our given dataset. So how we can get the price of the plots with the area not given in the dataset. This we can do using Linear Regression.

So at first, we will plot this data into a graph.

The below graphs describe the area of plots (10 sq mtr) in x-axis and its prices in y-axis (Lakhs INR).

Definition of Linear Regression

The objective of a linear regression model is to find a relationship between one or more features (independent variables) and a continuous target variable(dependent variable). When there is only feature it is called Univariate Linear Regression and if there are multiple features, it is called Multiple Linear Regression.

Hypothesis function:

Here we will try to find the relation between price and area of plots. As this is an example of univariate, we can see that the price is only dependent on the area of the plot.

By observing this pattern we can have our hypothesis function as below:

f(x) = w * x + b

where w is weightage and b is biased.

For the different value set of (w,b) there can be multiple line possible but for one set of value, it will be close to this pattern.

When we generalize this function for multivariable then there will be a set of values of w then these constants are also termed as model params.

Note: There is a range of mathematical functions that relate to this pattern and selection of the function is totally up to us. But point to be taken care is that neither it should be under or overmatched and function must be continuous so that we can easily differentiate it or it should have global minima or maxima.

Error for a point

As our hypothesis function is continuous, for every Xi (area points) there will be one Yi Predicted Price and Y will be the actual price.

So the error at any point,

Ei = Yi – Y = F(Xi) – Y

These errors are also called as residuals. These residuals can be positive (if actual points lie below the predicted line) or negative (if actual points lie above the predicted line). Our motive is to minimize this residual for each of the points.

Note: While observing the patterns it is possible that few points are very far from the pattern. For these far points, residuals will be much more so if these points are less in numbers than we can avoid these points considering that these are errors in the dataset. Such points are termed as outliers.

Energy Functions

As there are m training points, we can calculate the Average Energy function below

E (w,b) = 1/m ( iΣm (Ei) )

and

our motive is to minimize the energy functions

min (E (w,b)) at point ( w,b )

Little Calculus: For any continuous function, the points where the first derivative is zero are the points of either minima or maxima. If the second derivative is negative, it is the point of maxima and if it is positive, it is the point of minima.

Here we will do the trick – we will convert our energy function into an upper parabola by squaring the error function. It will ensure that our energy function will have only one global minima (the point of our concern). It will simplify our calculation that where the first derivative of the energy function will be zero is the point that we need and the value of (w,b) at that point will be our required point.

So our final Energy function is

E (w,b) = 1/2m ( iΣm (Ei)2 )

dividing by 2 doesn’t affect our result and at the time of derivation it will cancel out for e.g

the first derivative of x2 is 2x.

Gradient Descent Method

Gradient descent is a generic optimization algorithm. It iteratively hit and trials the parameters of the model in order to minimize the energy function.

In the above picture, we can see on the right side:
1. w0 and w1 is the random initialization and by following gradient descent it is moving towards global minima.
2. No of turns of the black line is the number of iterations so it must not be more or less.
3. The distance between the turns is alpha i.e the learning parameter.
By solving this left side equation we will be able to get model params at the global minima of energy functions.

Points to consider at the time of Gradient Descent calculations:
1. Random initialization: We start this algorithm at any random point that is set of random (w, b) value. By moving along this algorithm decide at which direction new trials have to be taken. As we know that it will be the upper parabola so by moving into the right direction (towards the global minima) we will get lesser value compared to the previous point.
2. No of iterations: No of iteration must not be more or less. If it is lesser, we will not reach global minima and if it is more, then it will be extra calculations around the global minima.
3. Alpha as learning parameters: when alpha is too small then gradient descent will be slow as it takes unnecessary steps to reach the global minima. If alpha is too big then it might overshoot the global minima. In this case it will neither converge nor diverge.
Implementation of Gradient Descent in Python
""" Method to read the csv file using Pandas and later use this data for linear regression. """ """ Better run with Python 3+. """ # Library to read csv file effectively import pandas import matplotlib.pyplot as plt import numpy as np # Method to read the csv file def load_data(file_name): column_names = ['area', 'price'] # To read columns io = pandas.read_csv(file_name,names=column_names, header=None) x_val = (io.values[1:, 0]) y_val = (io.values[1:, 1]) size_array = len(y_val) for i in range(size_array): x_val[i] = float(x_val[i]) y_val[i] = float(y_val[i]) return x_val, y_val # Call the method for a specific file x_raw, y_raw = load_data('area-price.csv') x_raw = x_raw.astype(np.float) y_raw = y_raw.astype(np.float) y = y_raw # Modeling w, b = 0.1, 0.1 num_epoch = 100 converge_rate = np.zeros([num_epoch , 1], dtype=float) learning_rate = 1e-3 for e in range(num_epoch): # Calculate the gradient of the loss function with respect to arguments (model parameters) manually. y_predicted = w * x_raw + b grad_w, grad_b = (y_predicted - y).dot(x_raw), (y_predicted - y).sum() # Update parameters. w, b = w - learning_rate * grad_w, b - learning_rate * grad_b converge_rate[e] = np.mean(np.square(y_predicted-y)) print(w, b) print(f"predicted function f(x) = x * {w} + {b}" ) calculatedprice = (10 * w) + b print(f"price of plot with area 10 sqmtr = 10 * {w} + {b} = {calculatedprice}")
```
""" Method to read the csv file using Pandas and later use this data for linear regression. """
""" Better run with Python 3+. """

# Library to read csv file effectively
import pandas
import matplotlib.pyplot as plt
import numpy as np

# Method to read the csv file
def load_data(file_name):
	column_names = ['area', 'price']
	# To read columns
	io = pandas.read_csv(file_name,names=column_names, header=None)
	x_val = (io.values[1:, 0])
	y_val = (io.values[1:, 1])
	size_array = len(y_val)
	for i in range(size_array):
		x_val[i] = float(x_val[i])
		y_val[i] = float(y_val[i])
		return x_val, y_val

# Call the method for a specific file
x_raw, y_raw = load_data('area-price.csv')
x_raw = x_raw.astype(np.float)
y_raw = y_raw.astype(np.float)
y = y_raw

# Modeling
w, b = 0.1, 0.1
num_epoch = 100
converge_rate = np.zeros([num_epoch , 1], dtype=float)
learning_rate = 1e-3
for e in range(num_epoch):
	# Calculate the gradient of the loss function with respect to arguments (model parameters) manually.
	y_predicted = w * x_raw + b
	grad_w, grad_b = (y_predicted - y).dot(x_raw), (y_predicted - y).sum()
	# Update parameters.
	w, b = w - learning_rate * grad_w, b - learning_rate * grad_b
	converge_rate[e] = np.mean(np.square(y_predicted-y))

print(w, b)
print(f"predicted function f(x) = x * {w} + {b}" )
calculatedprice = (10 * w) + b
print(f"price of plot with area 10 sqmtr = 10 * {w} + {b} = {calculatedprice}")
```
This is the basic implementation of Gradient Descent algorithms using numpy and Pandas. It is basically reading the area-price.csv file. Here we are normalizing the x-axis for better readability of data points over the graph. We have taken (w,b) as (0.1, 0.1) as random initialization. We have taken 100 as count of iterations and learning rate as .001.

In every iteration, we are calculating w and b value and seeing it for converging rate.

We can repeat this calculation for (w,b) for different values of random initialization, no of iterations and learning rate (alpha).

Note: There is another python Library TensorFlow which is more preferable for such calculations. There are inbuilt functions of Gradient Descent in TensorFlow. But for better understanding, we have used library numpy and pandas here.

RMSE (Root Mean Square Error)

RMSE: This is the method to verify that our calculation of (w,b) is accurate at what extent. Below is the basic formula of calculation of RMSE where f is the predicted value and the observed value.

Note: There is no absolute good or bad threshold value for RMSE, however, we can assume this based on our observed value. For an observed value ranges from 0 to 1000, the RMSE value of 0.7 is small, but if the range goes from 0 to 1, it is not that small.

Conclusion

As part of this article, we have seen a little introduction to Machine Learning and the need for it. Then with the help of a very basic example, we learned about one of the various optimization algorithms i.e. Linear Regression (for univariate only). This can be generalized for multivariate also. We then use the Gradient Descent Method for the calculation of the predicted data model in Linear Regression. We also learned the basic flow details of Gradient Descent. There is one example in python for displaying Linear Regression via Gradient Descent.
December 12, 2022