Classification Loss Functions (Part II)

In my previous post, I mentioned 3 loss functions, which are mostly intended to be used in Regression models. This time, I’m going to talk about Classification Loss Functions, which are going to be used to evaluate loss when predicting categorical outcomes.

Let’s consider the following vector to help us to show how loss functions behave:

import tensorflow as tf

sess = tf.Session()

x_function = tf.linspace(-3., 5., 500)
target = tf.constant(1.)
targets = tf.fill([500, ], 1.)

Hinge Loss Function

This function is used for training classifiers, most notably for SVM (Support Vector Machine). It is defined by the following:

Screen Shot 2017-12-15 at 7.47.13 PM

The central idea is to compute a loss between with two target classes, 1 and -1.

hinge_loss = tf.maximum(0., 1. - tf.multiply(target, x_function))
hinge_out =

Sigmoid Cross-Entropy Loss Function

This loss function can be used in machine learning for classification and optimization, it is referred as the logistic loss function, and can be used, for example, when we are classifying between two classes 0 or 1. TensorFlow internally performs this function, but mathematically it is defined as the following:

Screen Shot 2017-12-15 at 7.47.46 PMScreen Shot 2017-12-15 at 7.48.12 PM

cross_entropy_sigmoid_loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=x_function, labels=targets)
cross_entropy_sigmoid_out =

Weighted Cross Entropy Loss Function

This is a weighted version of the previous loss function, as we assign a weight on the positive target. For example, we can provide a weight of 0.5, as follows.

weight = tf.constant(0.5)
cross_entropy_weighted_loss = tf.nn.weighted_cross_entropy_with_logits(x_function, targets, weight)
cross_entropy_weighted_out =

Let’s plot these loss functions!

import matplotlib.pyplot as plt

x_array =
plt.plot(x_array, hinge_out, 'b-', label='Hinge Loss')
plt.plot(x_array, cross_entropy_sigmoid_out, 'k-.', label='Cross Entropy Sigmoid Loss')
plt.plot(x_array, cross_entropy_weighted_out, 'g:', label='Weighted Cross Enropy Loss (x0.5)')
plt.ylim(-1.5, 3)
plt.legend(loc='lower right', prop={'size': 11})

Screen Shot 2017-12-15 at 7.49.45 PM


  • Hinge Loss Function is great for SVM, but it is affected by outliers.
  • Cross Entropy Loss is very stable on training models, but it is less robust and can be affected on big data.

Loss Functions (Part 1)

Implementing Loss Functions is very important to machine learning algorithms because we can measure the error from the predicted outputs to the target values. Algorithms get optimized by evaluating outcomes depending on a specified loss function, and TensorFlow works in this way as well. We can think on Loss Functions telling us how good the predictions are compared to the expected values.

There are several loss functions we can use to train a machine learning algorithm, and I’ll try to explain some of them and when we can use them. Let’s consider the following vector to help us to show how loss functions behave:

import tensorflow as tf

sess = tf.Session()

# f(x) and target = 0
x_function = tf.linspace(-1., 1., 500)
target = tf.constant(0.)

L2-norm Loss Function (Least Squares Error LSE)

It is just the sum of the square of the distance to the target


L2 squares the error increasing by a lot if error > 1 (outlier can cause this kind of error), so the model is very sensitive to variations, and, when it is used to optimize an algorithm, it adjusts the model to minimize the error.

For any small adjustments of a data point, the regression line will move only slightly (regression parameters are continuous functions of the data).

TensorFlow has a built-in implementation, called tf.nn.l2_loss(), which actually perform the half of the previous equation.


In order to show you how loss functions behave, we are going to plot the points before to perform the summatory.

L2_function = tf.square(target - x_function)
L2_output =

L1-norm Loss Function (Least Absolute Error LAE)

It is just the sum of the absolute value of the distance to the target


If we compare L1 with L2, we can deduct that L1 is less sensitive to errors caused by outliers (because it doesn’t square the error). So, if we need to ignore the effects of outliers, we could consider using L1 instead of L2, if it is important to consider outliers, then L2 is a better option.

One issue to be aware of is that the L1 is not smooth at the target and this can result in algorithms not converging well.

L1_function = tf.abs(target - x_function)
L1_output =

Pseudo-Huber Loss Function

It is a smooth approximation to the Huber loss function. Huber loss is, as Wikipedia defines it, “a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss [LSE]”. This loss function attempts to take the best of the L1 and L2 by being convex near the target and less steep for extreme values. The form depends on an extra parameter, delta, which dictates how steep it will be.


We are going to test 3 values for delta:

delta1 = tf.constant(0.2)
pseudo_huber1 = tf.multiply(tf.square(delta1), tf.sqrt(1. + tf.square((target - x_function)/delta1)) - 1.)
pseudo_huber1_output =

delta2 = tf.constant(1.)
pseudo_huber2 = tf.multiply(tf.square(delta2), tf.sqrt(1. + tf.square((target - x_function) / delta2)) - 1.)
pseudo_huber2_output =

delta3 = tf.constant(5.)
pseudo_huber3 = tf.multiply(tf.square(delta3), tf.sqrt(1. + tf.square((target - x_function) / delta2)) - 1.)
pseudo_huber3_output =

Let’s plot this loss functions!

import matplotlib.pyplot as plt

x_array =
plt.plot(x_array, L2_output, 'b-', label='L2')
plt.plot(x_array, L1_output, 'r--', label='L1')
plt.plot(x_array, pseudo_huber1_output, 'm,', label='Pseudo-Huber (0.2)')
plt.plot(x_array, pseudo_huber2_output, 'k-.', label='Pseudo-Huber (1.0)')
plt.plot(x_array, pseudo_huber3_output, 'g:', label='Pseudo-Huber (5.0)')
plt.ylim(-0.2, 0.4)
plt.legend(loc='lower right', prop={'size': 11})
plt.title('LOSS FUNCTIONS')



I have several outliers but they are not so important, which loss function should I use?

L1 Loss Function, but probably you will have problem to converge to the best solution, so consider low learning rate.

I have several outliers, they occur under circumstances that I should take in account. Which loss function should I use?

L2 Loss Function, but too separated outlier could affect the model so probably you could consider normalize data before

I have several outliers, I don’t want them to affect my model and I need to converge to the best solution.

Use Pseudo-Hubber Loss Function, you need to take care of DELTA, a too big value as 5.0 could make the outliers affect your model again, and a too small value as 0.1 could make your model very slow to converge to solution.

Activation Functions in TensorFlow

Perceptron is a simple algorithm which, given an input vector x of m values (x1, x2, …, xm), outputs either 1 (ON) or 0 (OFF), and we define its function as follows:



Here, ω is a vector of weights, ωx is the dot product, and b is the bias. This equation reassembles the equation for a straight line. If x lies above this line, then the answer is positive, otherwise it is negative. However, ideally we are going to pass training data and let the computer to adjust weight and bias in such a way that the errors produced by this neuron will be minimized. The learning process should be able to recognize small changes that progressively teach our neuron to classify the information as we want. In the following image we don’t have “small changes” but a big change, and the neuron is not able to learn in this way because ω and bias will not converge into the optimal values to minimize errors.


Tangent to this function indicates that our neuron is learning; and, as we deduct from this, the tangent in x=0 is INFINITE. This is not possible in real scenarios because in real life all we learn step-by-step. In order to make our neuron learn, we need something to progressively change from 0 to 1: a continuous (and derivative) function.
When we start using neural networks we use activation functions as an essential part of a neuron. This activation function will allow us to adjust weights and bias.

In TensorFlow, we can find the activation functions in the neural network (nn) library.

Activation Functions




Mathematically, the function is continuous. As we can see, the sigmoid has a behavior similar to perceptron, but the changes are gradual and we can have output values different than 0 or 1.


>>> import tensorflow as tf
>>> sess = tf.Session()
>>> x = tf.lin_space(-3., 3., 24)
>>> print(
 [ 0.04742587 0.06070346 0.07739628 0.09819958 0.12384397 0.15503395
 0.1923546 0.23614843 0.28637746 0.34249979 0.40340331 0.46743745
 0.53256249 0.59659666 0.65750021 0.71362257 0.76385158 0.80764538
 0.84496599 0.87615603 0.90180045 0.92260367 0.9392966 0.95257413]

The sigmoid function is the most common activation function; however, this is not often used because of the tendency to 0-out the backpropagation terms during training.

ReLU (Rectified Linuear Unit)



This function has become very popular because it generates very good experimental results. The best advantage of ReLUs is that this function accelerates the convergence of SGD (stochastic gradient descent, which indicates how fast our neuron is learning), compared to Sigmoid and tanh functions.

This strength is, at the same way, the main weakness because this “learning speed” can make the neuron’s weights to be updated and oscillating from the optimal values and never activate on any point. For example, if the learning rate is too high, the half of neurons can be “dead”, but if we set a proper value then our networks will learn, but this will be slower than we expect.


>>> import tensorflow as tf
>>> sess = tf.Session()
>>> x = tf.lin_space(-3., 3., 24)
>>> print(
 [ 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0.13043475
 0.39130425 0.652174 0.9130435 1.173913 1.43478251 1.69565201
 1.95652151 2.21739101 2.47826099 2.7391305 3. ]




It seems this function was introduced in “Convolutional Deep Belief Networks on CIFAR-10” (page 2). Its main advantage, compared to simple ReLU, is that it is computationally faster and does not suffer from vanishing (infinitesimally near zero) or exploding values. As you can be figuring out, it will be used in Convolutional Neural Networks and Recurrent Neural Networks.


>>> import tensorflow as tf
>>> sess = tf.Session()
>>> x = tf.lin_space(-3., 9., 24)
>>> print(
 [ 0. 0. 0. 0. 0. 0.
 0.13043475 0.652174 1.173913 1.69565201 2.21739101 2.7391305
 3.2608695 3.78260851 4.30434799 4.826087 5.347826 5.86956501
 6. 6. 6. 6. 6. 6. ]

Hyperbolic Tangent


This function is very similar to sigmoid, except that instead of having a range between 0 and 1, it has a range between -1 and 1. Sadly, it has the same vanishing problem than Sigmoid.


>>> import tensorflow as tf
>>> sess = tf.Session()
>>> x = tf.lin_space(-5., 5., 24)
>>> print(
 [-0.99990922 -0.9997834 -0.99948329 -0.99876755 -0.99706209 -0.9930048
 -0.98339087 -0.96082354 -0.90900028 -0.79576468 -0.57313168 -0.21403044
 0.21402998 0.57313132 0.79576457 0.90900022 0.96082354 0.98339081
 0.9930048 0.99706209 0.99876755 0.99948329 0.9997834 0.99990922]


These activation functions help us to introduce nonlinearities in neural networks; if its range is between 0 and 1 (sigmoid), then the graph can only output values between 0 and 1.

We have some other activation functions implemented by TensorFlow, like softsign, softplus, ELU, cReLU, but most of them are not so frequently used, and the ithers are variations to the already explained functions. With the exception of dropout (which is not precisely an activation function but it will be heavily used in backpropagation, and I will explain it later), we have covered all stuff for this topic in TensorFlow. See you next time!

Working with Matrices in TensorFlow

Matrices are the basic elements we use to interchange data through computational graphs. In general terms, a tensor can de defined as a matrix, so you can refer to Declaring tensors in TensorFlow in order to see the options you have to create matrices.

Let’s define the matrices we are going to use in the examples:

import tensorflow as tf
import numpy as np

sess = tf.Session()

identity_matrix = tf.diag([1., 1., 1., 1., 1.])
mat_A = tf.truncated_normal([5, 2], dtype=tf.float32)
mat_B = tf.constant([[1., 2.], [3., 4.], [5., 6.], [7., 8.], [9., 10.]])
mat_C = tf.random_normal([5, ], mean=0, stddev=1.0)
mat_D = tf.convert_to_tensor(np.array([[1.2, 2.3, 3.4], [4.5, 5.6, 6.7], [7.8, 8.9, 9.10]]))

Matrix Operations

Addition and substraction are simple operations that can be performed by ‘+’ and ‘-‘ operators, or by tf.add() or tf.subtract().

# A + B
>>> print( + mat_B))
>>> print(, mat_B)))
 [[ 0.58516705 2.84226775]
 [ 2.3062849 4.91305351]
 [ 5.88148737 4.88284636]
 [ 6.40551376 6.56219101]
 [ 9.73429203 9.89524364]]
# B - B
>>> print( - mat_B))
>>> print(, mat_B)))
 [[ 0. 0.]
 [ 0. 0.]
 [ 0. 0.]
 [ 0. 0.]
 [ 0. 0.]]

Matrices multiplication must follow the following rule:


If this rule is accomplished, then we can perform multiplication.

tf.matmul() performs this operation; as an option, previously we can transpose or adjointe (conjugate and transpose), and optionally we can mark any matrix as sparsed. For example:

# B * Identity
>>> print(, identity_matrix, transpose_a=True, transpose_b=False)))
 [[ 1. 3. 5. 7. 9.]
 [ 2. 4. 6. 8. 10.]]

Other operations

# Transposed C
>>> print(
 [ 0.62711298 1.33686149 0.5819205 -0.85320765 0.59543872]
# Matrix Determinant D
>>> print(
# Matrix Inverse D
>>> print(
 [[-2.65381084 2.85583104 -1.11111111]
 [ 3.46189164 -4.77502296 2.22222222]
 [-1.11111111 2.22222222 -1.11111111]]
# Cholesky decomposition
>>> print(
 [[ 1. 0. 0. 0. 0.]
 [ 0. 1. 0. 0. 0.]
 [ 0. 0. 1. 0. 0.]
 [ 0. 0. 0. 1. 0.]
 [ 0. 0. 0. 0. 1.]]
# Eigen decomposition
>>> print(
 (array([ -3.77338787, -0.85092622, 20.52431408]), array([[-0.76408782, -0.4903048 , 0.41925053],
 [-0.21176465, 0.8045062 , 0.55491037],
 [ 0.60936487, -0.33521781, 0.71854261]]))

Element-wise Operations

# A * B (Element-wise)
>>> print(, mat_B)))
# A % B (Element-wise)
>>> print([2, 2], [5, 4])))
 [0 0]
# A / B (Element-wise)
>>> print([2, 2], [5, 4])))
 [ 0.4 0.5]
# A / B Floor-approximation (Element-wise)
>>> print([8, 8], [5, 4])))
 [1 2]
# A/B Remainder (Element-wise)
>>> print([8, 8], [5, 4])))
 [3 0]


>>> print([1, -1, 2], [5, 1, 3])))
 array([-5, 7, 6], dtype=int32)

We’ve completed all theoretical prerequisites for TensorFlow. Once we understand matrices, variables and placeholders, we can continue with Core TensorFlow. See you next time!

Understanding Variables and Placeholders in TensorFlow

Usually, when we start using TensorFlow, it’s very common to think that defining variables is just as trivial as a HelloWorld program, but understanding how variables (and placeholders) work under the hood is very important to understand more complex concepts because those concepts heavily use variables/placeholders; and, if we don’t understand the information flow between variables, it could be harder to have a clear idea of the implemented algorithms in TensorFlow.

Variables are the parameters of the algorithm. The main way to create a variable is by using the Variable() function, although, we still need to initialize it. Initializing is what puts the variable with the corresponding methods on the computational graph.

seq = tf.linspace(0., 7, 8)
seq_var = tf.Variable(seq)

# Initialize variables in session
sess = tf.Session()
initialize_op = tf.global_variables_initializer()

While each variable has an initializer() method, the most common way to do this is to use the function global_variables_initializer(). This function creates an operation in the graph that initializes all variables. Nevertheless, we can initialize variables depending on the results of initializing another variable, as follows:

sess = tf.Session()
first_var = tf.Variable(tf.lin_space(0., 7, 8), name='1st_var')
# first_var: <tf.Variable '1st_var:0' shape=(8,), dtype=float32_ref>

# second_var dimensions depends on first_var
second_var = tf.Variable(tf.zeros_like(first_var), name='2nd_var')
# second_var: <tf.Variable '2nd_var:0' shape=(8,), dtype=float32_ref>

Placeholders are just holding the position for data to be fed into the graph. To put a placeholder in the graph, we must perform at least one operation on the placeholder.

sess = tf.Session()
x = tf.placeholder(tf.float32, shape=[2, 2])
# y is the operation to run on x placeholder
y = tf.identity(x)

# x_vals is data to feed into the x placeholder
x_vals = np.random.rand(2, 2)
# Runs y operation, feed_dict={x: x_vals})

TensorFlow will not return a self-referenced placeholder in the feed dictionary.

With these concepts clear, we can move forward with TensorFlow. See you next time with more TF!

Declaring tensors in TensorFlow

[Requirement: Tensorflow and NumPy installed on Python +3.5]
[Requirement: import tensorflow as tf]
[Requirement: import numpy as np]

Tensors are the primary data structure we use in TensorFlow, and, as Wikipedia describes them, “tensors are geometric objects that describe linear relations between geometric vectors, scalars and other tensors”. Tensors can be described as multidimensional arrays, embracing the concepts of scalar, vector and matrix, without taking in consideration the coordinate system

The tensor order is the number of indexes we need to specify one element; so, an scalar will be an order 0 tensor, a vector will be an order 1 tensor, a matriz an order 2 tensor, and so on.

Fig 1. Order 3 Tensor

Now we know what a tensor is, I’m going to show you how we can declare tensors in TensorFlow.

1. Fixed tensors

>>> zeros_tsr = tf.zeros([5, 5], dtype=tf.int32, name='zeros5x5')
>>> print(zeros_tsr)
Tensor("zeros5x5:0", shape=(5, 5), dtype=int32)
>>> tf.InteractiveSession().run(zeros_tsr)
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])
>>> ones_tsr = tf.ones([5, 5], dtype=tf.float32, name='ones5x5')
>>> print(ones_tsr)
Tensor("ones5x5:0", shape=(5, 5), dtype=float32)
>>> tf.InteractiveSession().run(ones_tsr)
array([[ 1., 1., 1., 1., 1.],
       [ 1., 1., 1., 1., 1.],
       [ 1., 1., 1., 1., 1.],
       [ 1., 1., 1., 1., 1.],
       [ 1., 1., 1., 1., 1.]], dtype=float32)
>>> filled_tsr = tf.fill([5, 5], 123, name='filled123')
>>> print(filled_tsr)
Tensor("filled123:0", shape=(5, 5), dtype=int32)
>>> tf.InteractiveSession().run(filled_tsr)
array([[123, 123, 123, 123, 123],
      [123, 123, 123, 123, 123],
      [123, 123, 123, 123, 123],
      [123, 123, 123, 123, 123],
      [123, 123, 123, 123, 123]])
>>> filled2_tsr = tf.constant(123, shape=[5, 5], name='filled123_2', dtype=tf.int16)
>>> print(filled2_tsr)
Tensor("filled123_2:0", shape=(5, 5), dtype=int16)
>>> tf.InteractiveSession().run(filled2_tsr)
array([[123, 123, 123, 123, 123],
       [123, 123, 123, 123, 123],
       [123, 123, 123, 123, 123],
       [123, 123, 123, 123, 123],
       [123, 123, 123, 123, 123]], dtype=int16)
>>> constant_tsr = tf.constant([1, 2, 3], name='vector')
>>> print(constant_tsr)
Tensor("vector:0", shape=(3,), dtype=int32)
>>> tf.InteractiveSession().run(constant_tsr)
array([1, 2, 3])

2. Copying dimensions

It is necessary to previously define tensors from which we are going to copy dimensions.

>>> zeros_similar = tf.zeros_like(constant_tsr)
>>> print(zeros_similar)
Tensor("zeros_like:0", shape=(3,), dtype=int32)
>>> tf.InteractiveSession().run(zeros_similar)
array([0, 0, 0])
>>> ones_similar = tf.ones_like(constant_tsr)
>>> print(ones_similar)
Tensor("ones_like:0", shape=(3,), dtype=int32)
>>> tf.InteractiveSession().run(ones_similar)
array([1, 1, 1])

3. Sequence tensors

# This tensor defines 7 regular intervals between 0 and 2, 1st param should be float32/64
>>> linear_tsr = tf.linspace(0., 2, 7)
>>> print(linear_tsr)
Tensor("LinSpace_5:0", shape=(7,), dtype=float32)
>>> tf.InteractiveSession().run(linear_tsr)
array([ 0. , 0.33333334, 0.66666669, 1. , 1.33333337,
         1.66666675, 2. ], dtype=float32)
# This tensor defines 4 elements between 6 and 17, with a delta of 3
>>> int_seq_tsr = tf.range(start=6, limit=17, delta=3)
>>> print(int_seq_tsr)
Tensor("range_1:0", shape=(4,), dtype=int32)
>>> tf.InteractiveSession().run(int_seq_tsr)
array([ 6, 9, 12, 15])

4. Random tensors

# Random numbers from uniform distribution
>>> rand_unif_tsr = tf.random_uniform([5, 5], minval=0, maxval=1)
>>> print(rand_unif_tsr)
Tensor("random_uniform:0", shape=(5, 5), dtype=float32)
>>> tf.InteractiveSession().run(rand_unif_tsr)
array([[ 0.81911492, 0.01300693, 0.47359812, 0.50176537, 0.27962267],
       [ 0.47069478, 0.7151444 , 0.56615186, 0.5431906 , 0.45684898],
       [ 0.00939894, 0.19539773, 0.37774849, 0.08342052, 0.87758613],
       [ 0.46707201, 0.32422674, 0.90311491, 0.42251813, 0.3496896 ],
       [ 0.75080729, 0.48055971, 0.49421525, 0.77542639, 0.99400854]], dtype=float32)
# Random numbers from normal distribution
>>> rand_normal_tsr = tf.random_normal([5, 5], mean=0.0, stddev=1.0)
>>> print(rand_normal_tsr)
Tensor("random_normal:0", shape=(5, 5), dtype=float32)
>>> tf.InteractiveSession().run(rand_normal_tsr)
array([[ 2.13312769, 2.46189046, -0.34942248, -0.39776739, 1.79048693],
       [ 0.22045165, 0.05164593, -1.05943978, -0.32593197, -1.66411078],
       [-0.94263768, 1.77081263, -0.22290479, -0.24516548, 1.26560402],
       [-1.14855564, -0.89211422, 1.10751343, -2.17768288, -1.07004178],
       [ 0.635813 , 0.24745767, 0.80117846, -0.25315794, -1.88987064]], dtype=float32)
# Random numbers from normal distribution, limitating values within 2 SD from mean
>>> trunc_norm_tsr = tf.truncated_normal([5, 5], mean=0.0, stddev=1.0)
>>> print(trunc_norm_tsr)
Tensor("truncated_normal:0", shape=(5, 5), dtype=float32)
>>> tf.InteractiveSession().run(trunc_norm_tsr)
array([[-1.07923198, 0.66122717, -0.98569149, -0.11161296, -1.39560068],
       [-1.04248953, 1.83589756, 0.00709002, -0.70119679, 0.81637812],
       [-1.14046562, 0.65371871, 0.25081205, 1.59802651, -0.17030434],
       [ 0.61106592, -0.39884251, -0.02136615, 0.36585283, -1.45338166],
       [-0.64861351, 0.930076 , -0.1549242 , 1.45601475, 0.56357914]], dtype=float32)
# Shuffles existing tensor
>>> seq = tf.linspace(0., 7, 8)
>>> tf.InteractiveSession().run(seq)
array([ 0., 1., 2., 3., 4., 5., 6., 7.], dtype=float32)
>>> tf.InteractiveSession().run(tf.random_shuffle(seq))
array([ 5., 0., 4., 1., 6., 7., 2., 3.], dtype=float32)
# Randomicaly crops existing tensor to specified dimension
>>> tf.InteractiveSession().run(tf.random_crop(seq, [3,]))
array([ 2., 3., 4.], dtype=float32)

5. Converted from NumPy

>>> np_array = np.array([[1, 2], [3, 4]])
>>> np_tsr = tf.convert_to_tensor(np_array, dtype=tf.int32)
>>> print(np_tsr)
Tensor("Const:0", shape=(2, 2), dtype=int32)
>>> tf.InteractiveSession().run(np_tsr)
array([[1, 2],
       [3, 4]])

Once you have created your desired tensor, you should wrap it as a TensorFlow Variable, like this:

seq_var = tf.Variable(seq)

That’s all for today. See you next time with more TensorFlow!

Java Streams API in brief

First, let’s define what a stream is in Java 8: a sequence of functions, actions, inputs, and outputs (better defined as a “pipeline”). Streams API provides functional-style operations to transform these sequences; sources for them can contain arrays, collections, files, etc. In general terms, streams are Monads:

“Monads represent computations to be executed in a sequential (or parallel, in Java Streams) structure.”

Streams API is a great starting point for data preparation, and, if we compare it with its Python equivalents, Java provides us a very strong alternative for Data Science, as follows:

  • As a typed-language, the compiler can detect error and bugs.
  • Java bytecode is faster than scripting languages as R or Python
  • Versions of libraries in projects can be easily maintained with Maven or Gradle
  • Common big data frameworks such as Apache Hadoop or Spark are written in Java or JVM languages.
  • Creating models in Java makes easier to integrate them to production systems, which usually are written in Java or similar languages.

Let’s start some examples defining a simple Car class:

class Car {
     private final String name;
     private final Country origin;
     Car(String name, Country origin) { = name;
         this.origin = origin;
     public String getName() { return name; }
     public Country getOrigin() { return origin; }
 final class Country {
     private final int value;
     private final String name;
     public static Country GERMANY = new Country(1, "GERMANY");
     public static Country US = new Country(2, "US");
     public static Country UK = new Country(3, "UK");
     public static Country INDIA = new Country(4, "INDIA");
     public static Country JAPAN = new Country(5, "JAPAN");
     private Country(int value, String name) {
         this.value = value; = name;
     public int getValue() { return value; }
     public String getName() { return name; }

Streams API usually works with collections, so a useful method to convert an array into a collection is

Car[] cars = {
        new Car("GM", Country.US),
        new Car("Cadillac", Country.US),
        new Car("BMW", Country.GERMANY),
        new Car("Mercedes Benz", Country.GERMANY),
        new Car("Toyota", Country.JAPAN),
        new Car("Mazda", Country.JAPAN),
        new Car("Honda", Country.JAPAN),
        new Car("Mahindra", Country.INDIA),
        new Car("Land Rover", Country.UK)

List<Car> list = Arrays.asList(cars);
Stream<Car> stream =;

Streams are not reusable, so they have to be recreated in order to start a new processing pipeline, because of this I’m going to be using in all the examples as follows:

List<String> germanCars =
        .filter(x -> x.getOrigin().equals(Country.GERMANY))

This piece of code shows some useful Stream functions:

1. Filtering.

It directly works on the stream, and receives a lambda function to evaluate the filter, it returns a filtered Stream

2. Mapping

It provides a useful way of selecting a specific data member of Car class, and “maps” the stream of Cars into a stream of Strings.

3. Collecting

It performs a data transformation into lists, sets, strings, etc, using Collectors class, which provides useful transformation methods, just like Collectors.toList()

Three more examples:
String rawSentence =
Set<String> countries =
Map<Country, List<Car>> groupByCountry =

In this last two examples, we can see that we can operate on a stream over and over again, and it creates the “pipelines” we mentioned before.

There is a useful toMap() collector that can index a collection using some fields. For example, if we want to get a map from Car names to Car objects, it can be achieved using the following code:

Map<String, Car> tokenToWord =
        .collect(Collectors.toMap(Car::getName, Function.identity()));

Streams API provides streams of primitives (ints, doubles, and others), they have basic statistical methods such as sum, max, min, average, or summaryStatistics. For example:

int maxNameLength =
        .mapToInt(x -> x.getName().length())

Custom Collectors

We can also define our own collector by using Collector class which requires to pass a supplir class, accumulator, combiner and finisher method, for example:

Collector<Car, StringJoiner, String> carsCollector =
         () -> new StringJoiner(", "),
         (joiner, car) -> joiner.add(car.getName().toUpperCase()),
 String names =;

4. Parallelizing

Streams can be execute in parallel taking advantage of the available physical CPU cores. Streams use a common ForkJoinPool, we can check the size of the underlying thread-pool by using ForkJoinPool.commonPool (7 threads in my PC). For example:

ForkJoinPool commonPool = ForkJoinPool.commonPool();
 System.out.println("Threads: " + commonPool.getParallelism());
 int[] firstLengths = list.parallelStream()
         .filter(w -> w.getOrigin().equals(Country.JAPAN))

5. I/O

Finally, we can use Java I/O library, for example, using files to be represented as a stream of lines using Reader.lines() method.

InputStream fStream = Main.class.getResourceAsStream("text.txt");
 InputStreamReader fReader = new InputStreamReader(fStream, StandardCharsets.UTF_8);
 try (Stream<String> lines = new BufferedReader(fReader).lines()) {
     double average = lines
             .flatMap(line ->" ")))
     System.out.println("average token length: " + average);


Streams are a very powerful feature introduced in Java 8 (and to be extended soon in Java 9), very useful to process data for Data Science in Java. There are much more useful methods in Streams, which  More examples in the future.

Go to GitHub Repo:

Is it worth it to work as a freelancer?


Usually many of us graduates from college have dreamed of working in a large company with great salary, prestige and traveling around the world; or maybe we’ve had the “million-dollar idea” and we’ve put tears, sweat and blood to boot that beloved startup and appear in Wired as the startup of the year. No doubt many have experienced one of these two scenarios, which is commendable; for other people perhaps this was not as important as it was to lead a quiet and familiar life with a modest and stable job without the need to experience the jetlag of trips at every moment, or the stressing deadlines for launching the product; those who have chosen this option have chosen the quiet life and are also congratulated.

But there are those who have found no satisfaction in either case, whether due to failures, lost opportunities or simply the opportunity was never presented . The stablishment may lead them to think of them as “loosers”, but nothing further from true, as they may be potential or emerging freelancers.


Some years before starting my adventure as a freelancer I failed 2 times with a startup that never launched and succeded, and missed opportunities to work in spectacular companies. This left me with the thought that maybe that was not for me, and I decided to take a normal job and lead a quiet life. However, deep within me there was still a spark of entrepreneurship, and on the day my daughter was born, the call was reborn. Maybe it might seem like this was not the best time to do it, but there was now a need to organize my time much better so I could have hours of the day and see my daughter grow up.

Since then I have learned many things that I once considered obvious, but now I consider important in freelance work:

1. Enjoy the hours of the day

With this I don’t mean to work 20 hours a day, but it is important to give each task its time and space, and execute it in an excellent way: work well, get enough sleep, rest the necessary, a good time in family, a good time to leisure. The unsuccessful tasks are those that don’t contribute,

2. Dress for success

From experience I can say that your body “feels” that we are going to work and is predisposed to perform efficiently. Many people think that the freelancer works in pajamas, but doing so will simply predispose the body to being sleepy.

3. Do not neglect your work

At first we do not know the price we will give to our work, one tends to think that we could charge expensive and lose the customer, or very cheap and become a bargain. It may actually be a mid-point, but you need to consider that you probably have to cover the following costs:

  • Error correction
  • Delays in deadlines or payments
  • Unexpected expenses and delays

It is important to say that often the client is not responsible for these inconveniences, but the wise freelancer should be prevented. An approximate percentage is ~ 20% of the estimated time. If all goes well and there are no unexpected events, that surplus will be a backup until you get the next client or project.

4. The horse prepares for battle

The worst thing we can do is feel comfortable with what we learned the previous year, the skills required for the freelancer are always changing, and a continuous learning prepares us to be able to professionally assume each project.

5. Personal branding is vital

Personally I am not a fan of social networks, however, promoting our name is very important, mainly with the idea of ​​projecting us as professionals. The important resources to take in account are:

  • LinkedIn
  • Personal Site or Blog/Vlog
  • GitHub
  • Twitter

6. Search and you will find

Being persistant is extremely important, although you can wait for “capturing” a project in order to begin, once you start walking as a freelancer you will have to go looking for projects. Hopefully you will find a project right away, and you will have to enter to the field of being Project Manager and Developer at the same time; but usually starting a new project with the client takes a little time, in my experience about 2 months between starting the conversations and writing the first line of code


The adventure of being a freelancer is exciting because it involves 2 worlds: being an entrepreneur and being an employee, starting up involves a great effort until reaching the inertia and the necessary know-how for development. Are you wondering if it can be lucrative? Of course, it is lucrative enough not to envy employment in any large company, but it will require more time on your part. If you value family life and laisure times, it is possible to find a balance between incoming and free time to devote to other activities.

One very interesting option is being part of freelancing platforms like Toptal Web Engineering Group, which considers common needs of freelancers around the world. The defying task of getting clients is performed by TopTal, but it requires from you to be the best in what you do because only “la creme dela creme” of developers is chosen to work with TopTal. So, it’s hard but not impossible. Also, you can’t compare TopTal with other portals like Elance or Freelancers because they are too hard to get clients, even more complcated than finding projects by yourself in your local environment. I was able to work with Elance before, and it was good because of the client, but competence can be very hard and it’s not always a win-to-win relationship.

For those who intend to take this step of faith my recommendation is only to do it with determination and dedication. It may be very helpful to have an economic backup to get you started, but adventurers may enjoy cruising the desert to get to the promised land. Go ahead and Bon Voyage!

Blog at

Up ↑