TensorFlow Way for Linear Regression

In my two previous posts, we saw how we can perform Linear Regression using TensorFlow, but I’ve used Linear Least Squares Regression and Cholesky Decomposition, both them use matrices to resolve regression, and TensorFlow isn’t a requisite for this, but you can use more general packages like NumPy.

One of the most common applications of TensorFlow is training models by reducing loss function by backpropagation, and training the model by batches, this allow us to optimize the Gradient Descent and converge to solution.

Advantage: It allow us to use big amounts of data, and TensorFlow process it by batches.
Disadvantage: Solution is not so precise as if we use Linear Least Squares or Cholesky Decomposition.

In the following example I’ll use Iris data from scikit-learn (Sepal Length vs Pedal Width seem to have a linear relationship in the form y = mx + b), and L2 loss function:

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from sklearn import datasets

sess = tf.Session()
iris = datasets.load_iris()

x_vals = np.array([x[3] for x in iris.data])
y_vals = np.array([x[0] for x in iris.data])

learning_rate = 0.25
batch_size = 25

x_data = tf.placeholder(shape=(None, 1), dtype=tf.float32)
y_target = tf.placeholder(shape=(None, 1), dtype=tf.float32)
m = tf.Variable(tf.random_normal(shape=[1, 1]))
b = tf.Variable(tf.random_normal(shape=[1, 1]))

model_output = tf.add(tf.matmul(x_data, m), b)

loss = tf.reduce_mean(tf.square(y_target - model_output))
init = tf.global_variables_initializer()
sess.run(init)

my_opt = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
train_step = my_opt.minimize(loss)

loss_vec = []
for i in range(100):
    rand_index = np.random.choice(len(x_vals), size=batch_size)
    rand_x = np.transpose([x_vals[rand_index]])
    rand_y = np.transpose([y_vals[rand_index]])
    sess.run(train_step, feed_dict={x_data: rand_x, y_target: rand_y})
    temp_loss = sess.run(loss, feed_dict={x_data: rand_x, y_target: rand_y})
    loss_vec.append(temp_loss)

    if (i + 1) % 25 == 0:
        print('Step #' + str(i + 1) + ' A = ' + str(sess.run(m)) + 'b = ' + str(sess.run(b)))
        print('Loss = ''' + str(temp_loss))

[m_slope] = sess.run(m)
[y_intercept] = sess.run(b)
best_fit = []
for i in x_vals:
    best_fit.append(m_slope * i + y_intercept)

plt.plot(x_vals, y_vals, 'o', label='Points')
plt.plot(x_vals, best_fit, 'r-', label='Linear Reg.', linewidth=3)
plt.legend(loc='upper left')
plt.title('Sepal Length vs Pedal Width')
plt.xlabel('Pedal Width')
plt.ylabel('Sepal Length')
plt.show()
plt.plot(loss_vec, 'k-')
plt.title('L2 Loss')
plt.xlabel('Batches')
plt.ylabel('L2 Loss')
plt.show()

>> Step #25 A = [[0.90450525]]b = [[4.513903]]
>> Loss = 0.26871145
>> Step #50 A = [[0.94802374]]b = [[4.7737966]]
>> Loss = 0.2799243
>> Step #75 A = [[1.1669612]]b = [[4.906186]]
>> Loss = 0.2010894
>> Step #100 A = [[0.84404486]]b = [[4.730052]]
>> Loss = 0.15839271

Conclusion:

As we can see in the graphic, the gradient descent tries to converge to solution, but it is not so precise as matrix operations for Linear Least Squares and Cholesky Decomposition.

Cholesky Decomposition for Linear Regression with TensorFlow

Although Linear Least Squares Regression is simple and precise, it can be inefficient when matrices get very large. Cholesky decomposition is another approach to solve matrices efficiently by Linear Least Squares, as it decomposes a matrix into a lower and upper triangular matrix (L and LT). Finally, linear regression with Cholesky decomposition is similar to Linear Least Squares reduced to solving a system of linear equations:

LSER_03

Screen Shot 2018-01-20 at 8.45.10 PM
Cholesky Decomposition is already implemented in TensorFlow (which should be applied to XTX), nevertheless, you can see how this matrix can be found in the following link: Cholesky Decomposition.

Now, let’s see how to implement it with TensorFlow:

import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np

sess = tf.Session()

x_vals = np.linspace(start=0, stop=10, num=100)
y_vals = x_vals + np.random.normal(loc=0, scale=1, size=100)

x_vals_column = np.transpose(np.matrix(x_vals))
ones_column = np.transpose(np.matrix(np.repeat(a=1, repeats=100)))
X = np.column_stack((x_vals_column, ones_column))
Y = np.transpose(np.matrix(y_vals))
X_tensor = tf.constant(X)
Y_tensor = tf.constant(Y)

tX_X = tf.matmul(tf.transpose(X_tensor), X_tensor)
L = tf.cholesky(tX_X)
tX_Y = tf.matmul(tf.transpose(X_tensor), Y)
sol1 = tf.matrix_solve(L, tX_Y)
sol2 = tf.matrix_solve(tf.transpose(L), sol1)

solution_eval = sess.run(sol2)
m_slope = solution_eval[0][0]
b_intercept = solution_eval[1][0]
print('slope (m): ' + str(m_slope))
print('intercept (b): ' + str(b_intercept))

best_fit = []
for i in x_vals:
    best_fit.append(m_slope * i + b_intercept)

plt.plot(x_vals, y_vals, 'o', label='Data')
plt.plot(x_vals, best_fit, 'r-', label='Linear Regression', linewidth=3)
plt.legend(loc='upper left')
plt.show()

slope (m): 1.0830263227926582
intercept (b): -0.3348165868955632

linear_regression

As you can see, this solution is very similar to Linear Least Squares, but this decomposition is sometimes much more efficient and numerically stable.

Linear Least Squares Regression with TensorFlow

Linear Least Squares Regression is by far the most widely used regression method, and it is suitable for most cases when data behavior is linear. By definition, a line is defined by the following equation:

LSER_01

For all data points (xi, yi) we have to minimize the sum of the squared errors:

LSER_02

This is the equation we need to solve for all data points:

LSER_03

The solution for this equation is A (I’m not going to show how this solution is found, but you can see it in Wikipedia, and some code in several programming languages as well), which is defined by:

LSER_04

Now, let’s see the implementation with TensorFlow:

import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np

sess = tf.Session()
x_vals = np.linspace(0, 10, num=100)
y_vals = x_vals + np.random.normal(loc=0, scale=1, size=100)

x_vals_column = np.transpose(np.matrix(x_vals))
ones_column = np.transpose(np.matrix(np.repeat(1, repeats=100)))
X = np.column_stack((x_vals_column, ones_column))
Y = np.transpose(np.matrix(y_vals))

X_tensor = tf.constant(X)
Y_tensor = tf.constant(Y)

tX_X = tf.matmul(tf.transpose(X_tensor), X_tensor)
tX_X_inv = tf.matrix_inverse(tX_X)
product = tf.matmul(tX_X_inv, tf.transpose(X_tensor))
A = tf.matmul(product, Y_tensor)
A_eval = sess.run(A)

m_slope = A_eval[0][0]
b_intercept = A_eval[1][0]
print('slope (m): ' + str(m_slope))
print('intercept (b): ' + str(b_intercept))

best_fit = []
for i in x_vals:
best_fit.append(m_slope * i + b_intercept)

plt.plot(x_vals, y_vals, 'o', label='Data')
plt.plot(x_vals, best_fit, 'r-', label='Linear Regression', linewidth=3)
plt.legend(loc='upper left')
plt.show()

slope (m): 1.0108287140073253
intercept (b): 0.14322921334345343

linear_regression

As you can see, the implementation is just executing basic matrix operations, the advantage of using TensorFlow in this case is that we can add this process to a more complex graph.

Classification Loss Functions (Part II)

In my previous post, I mentioned 3 loss functions, which are mostly intended to be used in Regression models. This time, I’m going to talk about Classification Loss Functions, which are going to be used to evaluate loss when predicting categorical outcomes.

Let’s consider the following vector to help us to show how loss functions behave:

import tensorflow as tf

sess = tf.Session()

x_function = tf.linspace(-3., 5., 500)
target = tf.constant(1.)
targets = tf.fill([500, ], 1.)

Hinge Loss Function

This function is used for training classifiers, most notably for SVM (Support Vector Machine). It is defined by the following:

Screen Shot 2017-12-15 at 7.47.13 PM

The central idea is to compute a loss between with two target classes, 1 and -1.

hinge_loss = tf.maximum(0., 1. - tf.multiply(target, x_function))
hinge_out = sess.run(hinge_loss)

Sigmoid Cross-Entropy Loss Function

This loss function can be used in machine learning for classification and optimization, it is referred as the logistic loss function, and can be used, for example, when we are classifying between two classes 0 or 1. TensorFlow internally performs this function, but mathematically it is defined as the following:

Screen Shot 2017-12-15 at 7.47.46 PMScreen Shot 2017-12-15 at 7.48.12 PM

cross_entropy_sigmoid_loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=x_function, labels=targets)
cross_entropy_sigmoid_out = sess.run(cross_entropy_sigmoid_loss)

Weighted Cross Entropy Loss Function

This is a weighted version of the previous loss function, as we assign a weight on the positive target. For example, we can provide a weight of 0.5, as follows.

weight = tf.constant(0.5)
cross_entropy_weighted_loss = tf.nn.weighted_cross_entropy_with_logits(x_function, targets, weight)
cross_entropy_weighted_out = sess.run(cross_entropy_weighted_loss)

Let’s plot these loss functions!

import matplotlib.pyplot as plt

x_array = sess.run(x_function)
plt.plot(x_array, hinge_out, 'b-', label='Hinge Loss')
plt.plot(x_array, cross_entropy_sigmoid_out, 'k-.', label='Cross Entropy Sigmoid Loss')
plt.plot(x_array, cross_entropy_weighted_out, 'g:', label='Weighted Cross Enropy Loss (x0.5)')
plt.ylim(-1.5, 3)
plt.legend(loc='lower right', prop={'size': 11})
plt.show()

Screen Shot 2017-12-15 at 7.49.45 PM

Conclusions

  • Hinge Loss Function is great for SVM, but it is affected by outliers.
  • Cross Entropy Loss is very stable on training models, but it is less robust and can be affected on big data.

Loss Functions (Part 1)

Implementing Loss Functions is very important to machine learning algorithms because we can measure the error from the predicted outputs to the target values. Algorithms get optimized by evaluating outcomes depending on a specified loss function, and TensorFlow works in this way as well. We can think on Loss Functions telling us how good the predictions are compared to the expected values.

There are several loss functions we can use to train a machine learning algorithm, and I’ll try to explain some of them and when we can use them. Let’s consider the following vector to help us to show how loss functions behave:

import tensorflow as tf

sess = tf.Session()

# f(x) and target = 0
x_function = tf.linspace(-1., 1., 500)
target = tf.constant(0.)

L2-norm Loss Function (Least Squares Error LSE)

It is just the sum of the square of the distance to the target

equation1

L2 squares the error increasing by a lot if error > 1 (outlier can cause this kind of error), so the model is very sensitive to variations, and, when it is used to optimize an algorithm, it adjusts the model to minimize the error.

For any small adjustments of a data point, the regression line will move only slightly (regression parameters are continuous functions of the data).

TensorFlow has a built-in implementation, called tf.nn.l2_loss(), which actually perform the half of the previous equation.

equation2

In order to show you how loss functions behave, we are going to plot the points before to perform the summatory.

L2_function = tf.square(target - x_function)
L2_output = sess.run(L2_function)

L1-norm Loss Function (Least Absolute Error LAE)

It is just the sum of the absolute value of the distance to the target

equation3

If we compare L1 with L2, we can deduct that L1 is less sensitive to errors caused by outliers (because it doesn’t square the error). So, if we need to ignore the effects of outliers, we could consider using L1 instead of L2, if it is important to consider outliers, then L2 is a better option.

One issue to be aware of is that the L1 is not smooth at the target and this can result in algorithms not converging well.

L1_function = tf.abs(target - x_function)
L1_output = sess.run(L1_function)

Pseudo-Huber Loss Function

It is a smooth approximation to the Huber loss function. Huber loss is, as Wikipedia defines it, “a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss [LSE]”. This loss function attempts to take the best of the L1 and L2 by being convex near the target and less steep for extreme values. The form depends on an extra parameter, delta, which dictates how steep it will be.

equation4

We are going to test 3 values for delta:

delta1 = tf.constant(0.2)
pseudo_huber1 = tf.multiply(tf.square(delta1), tf.sqrt(1. + tf.square((target - x_function)/delta1)) - 1.)
pseudo_huber1_output = sess.run(pseudo_huber1)

delta2 = tf.constant(1.)
pseudo_huber2 = tf.multiply(tf.square(delta2), tf.sqrt(1. + tf.square((target - x_function) / delta2)) - 1.)
pseudo_huber2_output = sess.run(pseudo_huber2)

delta3 = tf.constant(5.)
pseudo_huber3 = tf.multiply(tf.square(delta3), tf.sqrt(1. + tf.square((target - x_function) / delta2)) - 1.)
pseudo_huber3_output = sess.run(pseudo_huber3)

Let’s plot this loss functions!

import matplotlib.pyplot as plt

x_array = sess.run(x_function)
plt.plot(x_array, L2_output, 'b-', label='L2')
plt.plot(x_array, L1_output, 'r--', label='L1')
plt.plot(x_array, pseudo_huber1_output, 'm,', label='Pseudo-Huber (0.2)')
plt.plot(x_array, pseudo_huber2_output, 'k-.', label='Pseudo-Huber (1.0)')
plt.plot(x_array, pseudo_huber3_output, 'g:', label='Pseudo-Huber (5.0)')
plt.ylim(-0.2, 0.4)
plt.legend(loc='lower right', prop={'size': 11})
plt.title('LOSS FUNCTIONS')
plt.show()

plot

Conclusions

I have several outliers but they are not so important, which loss function should I use?

L1 Loss Function, but probably you will have problem to converge to the best solution, so consider low learning rate.

I have several outliers, they occur under circumstances that I should take in account. Which loss function should I use?

L2 Loss Function, but too separated outlier could affect the model so probably you could consider normalize data before

I have several outliers, I don’t want them to affect my model and I need to converge to the best solution.

Use Pseudo-Hubber Loss Function, you need to take care of DELTA, a too big value as 5.0 could make the outliers affect your model again, and a too small value as 0.1 could make your model very slow to converge to solution.

Activation Functions in TensorFlow

Perceptron is a simple algorithm which, given an input vector x of m values (x1, x2, …, xm), outputs either 1 (ON) or 0 (OFF), and we define its function as follows:

perceptron.equation

dot.product

Here, ω is a vector of weights, ωx is the dot product, and b is the bias. This equation reassembles the equation for a straight line. If x lies above this line, then the answer is positive, otherwise it is negative. However, ideally we are going to pass training data and let the computer to adjust weight and bias in such a way that the errors produced by this neuron will be minimized. The learning process should be able to recognize small changes that progressively teach our neuron to classify the information as we want. In the following image we don’t have “small changes” but a big change, and the neuron is not able to learn in this way because ω and bias will not converge into the optimal values to minimize errors.

perceptron

Tangent to this function indicates that our neuron is learning; and, as we deduct from this, the tangent in x=0 is INFINITE. This is not possible in real scenarios because in real life all we learn step-by-step. In order to make our neuron learn, we need something to progressively change from 0 to 1: a continuous (and derivative) function.
When we start using neural networks we use activation functions as an essential part of a neuron. This activation function will allow us to adjust weights and bias.

In TensorFlow, we can find the activation functions in the neural network (nn) library.

Activation Functions

Sigmoid

sigmoid.equation

sigmoid

Mathematically, the function is continuous. As we can see, the sigmoid has a behavior similar to perceptron, but the changes are gradual and we can have output values different than 0 or 1.

Example:

>>> import tensorflow as tf
>>> sess = tf.Session()
>>> x = tf.lin_space(-3., 3., 24)
>>> print(sess.run(tf.nn.sigmoid(x)))
 [ 0.04742587 0.06070346 0.07739628 0.09819958 0.12384397 0.15503395
 0.1923546 0.23614843 0.28637746 0.34249979 0.40340331 0.46743745
 0.53256249 0.59659666 0.65750021 0.71362257 0.76385158 0.80764538
 0.84496599 0.87615603 0.90180045 0.92260367 0.9392966 0.95257413]

The sigmoid function is the most common activation function; however, this is not often used because of the tendency to 0-out the backpropagation terms during training.

ReLU (Rectified Linuear Unit)

relu.equation

relu

This function has become very popular because it generates very good experimental results. The best advantage of ReLUs is that this function accelerates the convergence of SGD (stochastic gradient descent, which indicates how fast our neuron is learning), compared to Sigmoid and tanh functions.

This strength is, at the same way, the main weakness because this “learning speed” can make the neuron’s weights to be updated and oscillating from the optimal values and never activate on any point. For example, if the learning rate is too high, the half of neurons can be “dead”, but if we set a proper value then our networks will learn, but this will be slower than we expect.

Example:

>>> import tensorflow as tf
>>> sess = tf.Session()
>>> x = tf.lin_space(-3., 3., 24)
>>> print(sess.run(tf.nn.relu(x)))
 [ 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0.13043475
 0.39130425 0.652174 0.9130435 1.173913 1.43478251 1.69565201
 1.95652151 2.21739101 2.47826099 2.7391305 3. ]

ReLU6

relu6.equation

relu6

It seems this function was introduced in “Convolutional Deep Belief Networks on CIFAR-10” (page 2). Its main advantage, compared to simple ReLU, is that it is computationally faster and does not suffer from vanishing (infinitesimally near zero) or exploding values. As you can be figuring out, it will be used in Convolutional Neural Networks and Recurrent Neural Networks.

Example:

>>> import tensorflow as tf
>>> sess = tf.Session()
>>> x = tf.lin_space(-3., 9., 24)
>>> print(sess.run(tf.nn.relu6(x)))
 [ 0. 0. 0. 0. 0. 0.
 0.13043475 0.652174 1.173913 1.69565201 2.21739101 2.7391305
 3.2608695 3.78260851 4.30434799 4.826087 5.347826 5.86956501
 6. 6. 6. 6. 6. 6. ]

Hyperbolic Tangent

tanh

This function is very similar to sigmoid, except that instead of having a range between 0 and 1, it has a range between -1 and 1. Sadly, it has the same vanishing problem than Sigmoid.

Example:

>>> import tensorflow as tf
>>> sess = tf.Session()
>>> x = tf.lin_space(-5., 5., 24)
>>> print(sess.run(tf.nn.tanh(x)))
 [-0.99990922 -0.9997834 -0.99948329 -0.99876755 -0.99706209 -0.9930048
 -0.98339087 -0.96082354 -0.90900028 -0.79576468 -0.57313168 -0.21403044
 0.21402998 0.57313132 0.79576457 0.90900022 0.96082354 0.98339081
 0.9930048 0.99706209 0.99876755 0.99948329 0.9997834 0.99990922]

Conclusion

These activation functions help us to introduce nonlinearities in neural networks; if its range is between 0 and 1 (sigmoid), then the graph can only output values between 0 and 1.

We have some other activation functions implemented by TensorFlow, like softsign, softplus, ELU, cReLU, but most of them are not so frequently used, and the ithers are variations to the already explained functions. With the exception of dropout (which is not precisely an activation function but it will be heavily used in backpropagation, and I will explain it later), we have covered all stuff for this topic in TensorFlow. See you next time!

Working with Matrices in TensorFlow

Matrices are the basic elements we use to interchange data through computational graphs. In general terms, a tensor can de defined as a matrix, so you can refer to Declaring tensors in TensorFlow in order to see the options you have to create matrices.

Let’s define the matrices we are going to use in the examples:

import tensorflow as tf
import numpy as np

sess = tf.Session()

identity_matrix = tf.diag([1., 1., 1., 1., 1.])
mat_A = tf.truncated_normal([5, 2], dtype=tf.float32)
mat_B = tf.constant([[1., 2.], [3., 4.], [5., 6.], [7., 8.], [9., 10.]])
mat_C = tf.random_normal([5, ], mean=0, stddev=1.0)
mat_D = tf.convert_to_tensor(np.array([[1.2, 2.3, 3.4], [4.5, 5.6, 6.7], [7.8, 8.9, 9.10]]))

Matrix Operations

Addition and substraction are simple operations that can be performed by ‘+’ and ‘-‘ operators, or by tf.add() or tf.subtract().

# A + B
>>> print(sess.run(mat_A + mat_B))
>>> print(sess.run(tf.add(mat_A, mat_B)))
 [[ 0.58516705 2.84226775]
 [ 2.3062849 4.91305351]
 [ 5.88148737 4.88284636]
 [ 6.40551376 6.56219101]
 [ 9.73429203 9.89524364]]
# B - B
>>> print(sess.run(mat_B - mat_B))
>>> print(sess.run(tf.subtract(mat_B, mat_B)))
 [[ 0. 0.]
 [ 0. 0.]
 [ 0. 0.]
 [ 0. 0.]
 [ 0. 0.]]

Matrices multiplication must follow the following rule:

matmult

If this rule is accomplished, then we can perform multiplication.

tf.matmul() performs this operation; as an option, previously we can transpose or adjointe (conjugate and transpose), and optionally we can mark any matrix as sparsed. For example:

# B * Identity
>>> print(sess.run(tf.matmul(mat_B, identity_matrix, transpose_a=True, transpose_b=False)))
 [[ 1. 3. 5. 7. 9.]
 [ 2. 4. 6. 8. 10.]]

Other operations

# Transposed C
>>> print(sess.run(tf.transpose(mat_C)))
 [ 0.62711298 1.33686149 0.5819205 -0.85320765 0.59543872]
# Matrix Determinant D
>>> print(sess.run(tf.matrix_determinant(mat_D)))
 3.267
# Matrix Inverse D
>>> print(sess.run(tf.matrix_inverse(mat_D)))
 [[-2.65381084 2.85583104 -1.11111111]
 [ 3.46189164 -4.77502296 2.22222222]
 [-1.11111111 2.22222222 -1.11111111]]
# Cholesky decomposition
>>> print(sess.run(tf.cholesky(identity_matrix)))
 [[ 1. 0. 0. 0. 0.]
 [ 0. 1. 0. 0. 0.]
 [ 0. 0. 1. 0. 0.]
 [ 0. 0. 0. 1. 0.]
 [ 0. 0. 0. 0. 1.]]
# Eigen decomposition
>>> print(sess.run(tf.self_adjoint_eig(mat_D)))
 (array([ -3.77338787, -0.85092622, 20.52431408]), array([[-0.76408782, -0.4903048 , 0.41925053],
 [-0.21176465, 0.8045062 , 0.55491037],
 [ 0.60936487, -0.33521781, 0.71854261]]))

Element-wise Operations

# A * B (Element-wise)
>>> print(sess.run(tf.multiply(mat_A, mat_B)))
# A % B (Element-wise)
>>> print(sess.run(tf.div([2, 2], [5, 4])))
 [0 0]
# A / B (Element-wise)
>>> print(sess.run(tf.truediv([2, 2], [5, 4])))
 [ 0.4 0.5]
# A / B Floor-approximation (Element-wise)
>>> print(sess.run(tf.floordiv([8, 8], [5, 4])))
 [1 2]
# A/B Remainder (Element-wise)
>>> print(sess.run(tf.mod([8, 8], [5, 4])))
 [3 0]

Cross-product

>>> print(sess.run(tf.cross([1, -1, 2], [5, 1, 3])))
 array([-5, 7, 6], dtype=int32)

We’ve completed all theoretical prerequisites for TensorFlow. Once we understand matrices, variables and placeholders, we can continue with Core TensorFlow. See you next time!

Understanding Variables and Placeholders in TensorFlow

Usually, when we start using TensorFlow, it’s very common to think that defining variables is just as trivial as a HelloWorld program, but understanding how variables (and placeholders) work under the hood is very important to understand more complex concepts because those concepts heavily use variables/placeholders; and, if we don’t understand the information flow between variables, it could be harder to have a clear idea of the implemented algorithms in TensorFlow.

Variables are the parameters of the algorithm. The main way to create a variable is by using the Variable() function, although, we still need to initialize it. Initializing is what puts the variable with the corresponding methods on the computational graph.

seq = tf.linspace(0., 7, 8)
seq_var = tf.Variable(seq)

# Initialize variables in session
sess = tf.Session()
initialize_op = tf.global_variables_initializer()
sess.run(initialize_op)

While each variable has an initializer() method, the most common way to do this is to use the function global_variables_initializer(). This function creates an operation in the graph that initializes all variables. Nevertheless, we can initialize variables depending on the results of initializing another variable, as follows:

sess = tf.Session()
first_var = tf.Variable(tf.lin_space(0., 7, 8), name='1st_var')
sess.run(first_var.initializer)
# first_var: <tf.Variable '1st_var:0' shape=(8,), dtype=float32_ref>

# second_var dimensions depends on first_var
second_var = tf.Variable(tf.zeros_like(first_var), name='2nd_var')
sess.run(second_var.initializer)
# second_var: <tf.Variable '2nd_var:0' shape=(8,), dtype=float32_ref>

Placeholders are just holding the position for data to be fed into the graph. To put a placeholder in the graph, we must perform at least one operation on the placeholder.

sess = tf.Session()
x = tf.placeholder(tf.float32, shape=[2, 2])
# y is the operation to run on x placeholder
y = tf.identity(x)

# x_vals is data to feed into the x placeholder
x_vals = np.random.rand(2, 2)
# Runs y operation
sess.run(y, feed_dict={x: x_vals})

TensorFlow will not return a self-referenced placeholder in the feed dictionary.

With these concepts clear, we can move forward with TensorFlow. See you next time with more TF!

Declaring tensors in TensorFlow

[Requirement: Tensorflow and NumPy installed on Python +3.5]
[Requirement: import tensorflow as tf]
[Requirement: import numpy as np]

Tensors are the primary data structure we use in TensorFlow, and, as Wikipedia describes them, “tensors are geometric objects that describe linear relations between geometric vectors, scalars and other tensors”. Tensors can be described as multidimensional arrays, embracing the concepts of scalar, vector and matrix, without taking in consideration the coordinate system

The tensor order is the number of indexes we need to specify one element; so, an scalar will be an order 0 tensor, a vector will be an order 1 tensor, a matriz an order 2 tensor, and so on.

Order.3.Tensor
Fig 1. Order 3 Tensor

Now we know what a tensor is, I’m going to show you how we can declare tensors in TensorFlow.

1. Fixed tensors

>>> zeros_tsr = tf.zeros([5, 5], dtype=tf.int32, name='zeros5x5')
>>> print(zeros_tsr)
Tensor("zeros5x5:0", shape=(5, 5), dtype=int32)
>>> tf.InteractiveSession().run(zeros_tsr)
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])
>>> ones_tsr = tf.ones([5, 5], dtype=tf.float32, name='ones5x5')
>>> print(ones_tsr)
Tensor("ones5x5:0", shape=(5, 5), dtype=float32)
>>> tf.InteractiveSession().run(ones_tsr)
array([[ 1., 1., 1., 1., 1.],
       [ 1., 1., 1., 1., 1.],
       [ 1., 1., 1., 1., 1.],
       [ 1., 1., 1., 1., 1.],
       [ 1., 1., 1., 1., 1.]], dtype=float32)
>>> filled_tsr = tf.fill([5, 5], 123, name='filled123')
>>> print(filled_tsr)
Tensor("filled123:0", shape=(5, 5), dtype=int32)
>>> tf.InteractiveSession().run(filled_tsr)
array([[123, 123, 123, 123, 123],
      [123, 123, 123, 123, 123],
      [123, 123, 123, 123, 123],
      [123, 123, 123, 123, 123],
      [123, 123, 123, 123, 123]])
>>> filled2_tsr = tf.constant(123, shape=[5, 5], name='filled123_2', dtype=tf.int16)
>>> print(filled2_tsr)
Tensor("filled123_2:0", shape=(5, 5), dtype=int16)
>>> tf.InteractiveSession().run(filled2_tsr)
array([[123, 123, 123, 123, 123],
       [123, 123, 123, 123, 123],
       [123, 123, 123, 123, 123],
       [123, 123, 123, 123, 123],
       [123, 123, 123, 123, 123]], dtype=int16)
>>> constant_tsr = tf.constant([1, 2, 3], name='vector')
>>> print(constant_tsr)
Tensor("vector:0", shape=(3,), dtype=int32)
>>> tf.InteractiveSession().run(constant_tsr)
array([1, 2, 3])

2. Copying dimensions

It is necessary to previously define tensors from which we are going to copy dimensions.

>>> zeros_similar = tf.zeros_like(constant_tsr)
>>> print(zeros_similar)
Tensor("zeros_like:0", shape=(3,), dtype=int32)
>>> tf.InteractiveSession().run(zeros_similar)
array([0, 0, 0])
>>> ones_similar = tf.ones_like(constant_tsr)
>>> print(ones_similar)
Tensor("ones_like:0", shape=(3,), dtype=int32)
>>> tf.InteractiveSession().run(ones_similar)
array([1, 1, 1])

3. Sequence tensors

# This tensor defines 7 regular intervals between 0 and 2, 1st param should be float32/64
>>> linear_tsr = tf.linspace(0., 2, 7)
>>> print(linear_tsr)
Tensor("LinSpace_5:0", shape=(7,), dtype=float32)
>>> tf.InteractiveSession().run(linear_tsr)
array([ 0. , 0.33333334, 0.66666669, 1. , 1.33333337,
         1.66666675, 2. ], dtype=float32)
# This tensor defines 4 elements between 6 and 17, with a delta of 3
>>> int_seq_tsr = tf.range(start=6, limit=17, delta=3)
>>> print(int_seq_tsr)
Tensor("range_1:0", shape=(4,), dtype=int32)
>>> tf.InteractiveSession().run(int_seq_tsr)
array([ 6, 9, 12, 15])

4. Random tensors

# Random numbers from uniform distribution
>>> rand_unif_tsr = tf.random_uniform([5, 5], minval=0, maxval=1)
>>> print(rand_unif_tsr)
Tensor("random_uniform:0", shape=(5, 5), dtype=float32)
>>> tf.InteractiveSession().run(rand_unif_tsr)
array([[ 0.81911492, 0.01300693, 0.47359812, 0.50176537, 0.27962267],
       [ 0.47069478, 0.7151444 , 0.56615186, 0.5431906 , 0.45684898],
       [ 0.00939894, 0.19539773, 0.37774849, 0.08342052, 0.87758613],
       [ 0.46707201, 0.32422674, 0.90311491, 0.42251813, 0.3496896 ],
       [ 0.75080729, 0.48055971, 0.49421525, 0.77542639, 0.99400854]], dtype=float32)
# Random numbers from normal distribution
>>> rand_normal_tsr = tf.random_normal([5, 5], mean=0.0, stddev=1.0)
>>> print(rand_normal_tsr)
Tensor("random_normal:0", shape=(5, 5), dtype=float32)
>>> tf.InteractiveSession().run(rand_normal_tsr)
array([[ 2.13312769, 2.46189046, -0.34942248, -0.39776739, 1.79048693],
       [ 0.22045165, 0.05164593, -1.05943978, -0.32593197, -1.66411078],
       [-0.94263768, 1.77081263, -0.22290479, -0.24516548, 1.26560402],
       [-1.14855564, -0.89211422, 1.10751343, -2.17768288, -1.07004178],
       [ 0.635813 , 0.24745767, 0.80117846, -0.25315794, -1.88987064]], dtype=float32)
# Random numbers from normal distribution, limitating values within 2 SD from mean
>>> trunc_norm_tsr = tf.truncated_normal([5, 5], mean=0.0, stddev=1.0)
>>> print(trunc_norm_tsr)
Tensor("truncated_normal:0", shape=(5, 5), dtype=float32)
>>> tf.InteractiveSession().run(trunc_norm_tsr)
array([[-1.07923198, 0.66122717, -0.98569149, -0.11161296, -1.39560068],
       [-1.04248953, 1.83589756, 0.00709002, -0.70119679, 0.81637812],
       [-1.14046562, 0.65371871, 0.25081205, 1.59802651, -0.17030434],
       [ 0.61106592, -0.39884251, -0.02136615, 0.36585283, -1.45338166],
       [-0.64861351, 0.930076 , -0.1549242 , 1.45601475, 0.56357914]], dtype=float32)
# Shuffles existing tensor
>>> seq = tf.linspace(0., 7, 8)
>>> tf.InteractiveSession().run(seq)
array([ 0., 1., 2., 3., 4., 5., 6., 7.], dtype=float32)
>>> tf.InteractiveSession().run(tf.random_shuffle(seq))
array([ 5., 0., 4., 1., 6., 7., 2., 3.], dtype=float32)
# Randomicaly crops existing tensor to specified dimension
>>> tf.InteractiveSession().run(tf.random_crop(seq, [3,]))
array([ 2., 3., 4.], dtype=float32)

5. Converted from NumPy

>>> np_array = np.array([[1, 2], [3, 4]])
>>> np_tsr = tf.convert_to_tensor(np_array, dtype=tf.int32)
>>> print(np_tsr)
Tensor("Const:0", shape=(2, 2), dtype=int32)
>>> tf.InteractiveSession().run(np_tsr)
array([[1, 2],
       [3, 4]])

Once you have created your desired tensor, you should wrap it as a TensorFlow Variable, like this:

seq_var = tf.Variable(seq)

That’s all for today. See you next time with more TensorFlow!

Blog at WordPress.com.

Up ↑