Crucial Part Of The Neuron (AI)

(this is part 2 of the “AI for Finance Course“)

Previously, you might have seen how the most basic form of a neuron simply cannot handle all the data that is thrown at it, specifically low values.

This means, if both of our inputs are zero, regardless of how good our weights are tuned, the sum will always be zero.
(and consequently, sigmoid function will always be 0.5)

// [0.1, 0.2] : 0.612728 <-- problem
// [0.5, 1] : 0.908376
// [1, 0.5] : 0.999506
// [1, 1] : 0.998647
// [0, 0] : 0.5          <-- problem

This was clearly visible from our predictions after our neuron (perceptron) was fully trained:

no matter what I do
or how long I train it
or how many trillions of data points I provide

=> the output for [0, 0] inputs will always be 0.5

Fixing Near-Zero Inputs

The solution is very simple. We just need to add the “constant” term to allow our neuron (which is basically just a function) to have non-zero outputs for purely-zero inputs.

So instead of having a neuron like this…

…we’re going to have a neuron like this:

INPUT0 is always going to be 1, and it will never change. The only thing that can be changed is WEIGHT0.

Now, in our previous example without a constant INPUT0, we can see that our output was always 0.5 when all our inputs were zero.

// [0.1, 0.2] : 0.612728 <-- problem
// [0.5, 1] : 0.908376
// [1, 0.5] : 0.999506
// [1, 1] : 0.998647
// [0, 0] : 0.5          <-- problem

In this case if we set our WEIGHT0 to -100 (or whatever big negative number), our squashing function will be close to zero.

And in our hypothetical AgroX example, this would mean we should sell this stock.

Reminder:

INPUT1 = 0 means the temperature was lowest on record
INPUT2 = 0 means the rain was lowest on record

This means the crop season will be terrible, therefore sales will be low, and stock price will fall. So we should sell (i.e. out network should output zero)

Fixing The Code

There is almost no change to the code from the previous lesson, so I’m not going to analyze it a bit by bit. There are only 3 lines changed as indicated by the comments in the code.

The only difference is:

now there’s going to be 3 weights instead of one
and I’ll add a constant term 1.0 to the inputs

Everything else will stay the same. I will re-run the training and we will see if the outputs are now fixed.
(i.e. whether the error is lower)

Here is the code in its entirety:

import std.stdio : pp = writeln;
import std.algorithm : sum;
import std.array : array;
import std.math.trigonometry : tanh;


double[] weights = [0.5, 0.3, 0.4]; // changed line

double[][] inputs = [
	[0.1, 0.2], // 0 - bad year - sell
	[0.5, 1.0], // 1 - avg temp & best rain - buy
	[1.0, 0.5], // 1 - best temp & avg rain - buy
	[1.0, 1.0], // 1 - best temp & best rain - BUY
	[0.0, 0.0], // 0 - worst year on record - SELL!!!
];

double[] outputs = [0, 1, 1, 1, 0];

double sigmoid(double input) { return (tanh(input) + 1) / 2.0; }

void updateWeights(double prediction, double[] inputs, double output) {
	double error = prediction - output;
	pp("error: ", error);
	double[3] errorCorrections = ([1.0] ~ inputs)[] * error; // changed
	weights[] -= errorCorrections[];
	pp("weights: ", weights);
}

double getPrediction(double[] inputs) {
	double[3] weightedInputs = ([1.0] ~ inputs)[] * weights[]; // changed
	double weightedSum = weightedInputs.array.sum;
	double prediction = sigmoid(weightedSum);
	return prediction;
}

void trainNetwork() {
	for (int n = 0; n < 1000; n++) { // 1000 times through the data
		for (int i = 0; i < inputs.length; i++) {
			double prediction = getPrediction(inputs[i]);
			updateWeights(prediction, inputs[i], outputs[i]);
		}
	}
}

void main() {
	pp("weights: ", weights);
	trainNetwork();

	// TESTING
	pp();
	pp(inputs[0], " : ", getPrediction(inputs[0]));
	pp(inputs[1], " : ", getPrediction(inputs[1]));
	pp(inputs[2], " : ", getPrediction(inputs[2]));
	pp(inputs[3], " : ", getPrediction(inputs[3]));
	pp(inputs[4], " : ", getPrediction(inputs[4]));
	pp();
}

And if you take a look at the outputs now, they are nearly perfect:

// [0.1, 0.2] : 0.00214298
// [0.5, 1] : 0.999116
// [1, 0.5] : 0.999389
// [1, 1] : 0.999997
// [0, 0] : 7.97197e-05 <-- excellent

And just as a reminder, these were our previous predictions where our neuron (perceptron) didn’t have the “constant” term.
(also known as bias)

// [0.1, 0.2] : 0.612728 <-- problem
// [0.5, 1] : 0.908376
// [1, 0.5] : 0.999506
// [1, 1] : 0.998647
// [0, 0] : 0.5          <-- problem

Also it might be interesting to see how the weights changed:

// OLD WEIGHTS
// [4.31008, -1.00806]

// NEW WEIGHTS
// [-4.71846, 5.73579, 5.36586]

In the end, WEIGHT0 wasn’t -100 in but it was large enough for our function to output near-zero value of 7.97197e-05

Now that we have the best possible neuron we can have, let’s jump into the next lesson:

NEXT LESSON:
Limits of The Single Neuron (AI)

Fixing Near-Zero Inputs

Fixing The Code

Pages

Recent Posts

Categories

Meta