How to make a neural network in Python

Learn how to build a neural network in Python. This guide covers different methods, tips, real-world applications, and debugging common errors.

How to make a neural network in Python
Published on: 
Fri
Feb 20, 2026
Updated on: 
Mon
Apr 6, 2026
The Replit Team

A neural network in Python is an excellent project to start your machine learning journey. Python’s libraries provide the tools to construct powerful models with surprisingly straightforward code.

In this article, we'll walk through the core techniques to build your first network. You'll get practical tips, see real-world applications, and receive essential debugging advice to guide your project.

Creating a simple neural network with NumPy

import numpy as np

class SimpleNN:
def __init__(self, input_size, hidden_size, output_size):
self.W1 = np.random.randn(input_size, hidden_size) * 0.01
self.b1 = np.zeros((1, hidden_size))
self.W2 = np.random.randn(hidden_size, output_size) * 0.01
self.b2 = np.zeros((1, output_size))

def forward(self, X):
self.z1 = np.dot(X, self.W1) + self.b1
self.a1 = np.tanh(self.z1)
self.z2 = np.dot(self.a1, self.W2) + self.b2
return self.z2--OUTPUT--# No output until the network is used for prediction

This code defines a simple two-layer neural network using the SimpleNN class. The __init__ method sets up the network’s structure by initializing weights (W1, W2) with small random numbers. This is a key step that prevents neurons from all learning the same thing. Biases (b1, b2) are set to zero, providing a neutral starting point.

The forward method describes how data flows through the network. It uses np.dot for the memory-efficient matrix math that connects the layers. The np.tanh function acts as an activation function, introducing non-linearity that allows the network to learn complex patterns, not just simple linear relationships.

Popular neural network frameworks

While the NumPy approach is fundamental, frameworks like TensorFlow, PyTorch, and Keras streamline the process, letting you build more complex models with less code.

Using TensorFlow for neural networks

import tensorflow as tf

model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])--OUTPUT--# No output until the model is trained

This TensorFlow example uses the tf.keras.Sequential model, which simplifies network construction by letting you stack layers in order. The configuration is much more abstract than the manual NumPy method.

  • The first Dense layer contains 128 neurons with a relu activation function.
  • A Dropout layer helps prevent overfitting by randomly ignoring 20% of neuron outputs during training.
  • The final Dense layer uses softmax to output probabilities across 10 classes.

Before training, model.compile() configures the learning process. It sets the adam optimizer, the loss function, and the performance metric, which is accuracy here. For more details on training a model in Python, including data preparation and validation strategies.

Building a neural network with PyTorch

import torch
import torch.nn as nn

class NeuralNetwork(nn.Module):
def __init__(self):
super().__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(784, 128),
nn.ReLU(),
nn.Linear(128, 10)
)

def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits--OUTPUT--# No output until the network is instantiated and used

PyTorch uses a more explicit, object-oriented structure where you define your network as a class that inherits from nn.Module. The network’s layers are set up in the __init__ method, and the forward method dictates how data actually flows through them.

  • nn.Sequential is a container that neatly organizes the sequence of layers.
  • nn.Flatten prepares the input by converting it into a one-dimensional array.
  • nn.Linear and nn.ReLU define the connected layers and activation function, similar to the other frameworks.

The forward function is where the action happens—it takes an input x, passes it through the flatten layer, and then sends it through your defined stack to get the final output.

Creating a neural network with Keras functional API

from tensorflow import keras
from tensorflow.keras import layers

inputs = keras.Input(shape=(784,))
x = layers.Dense(128, activation="relu")(inputs)
x = layers.Dense(64, activation="relu")(x)
outputs = layers.Dense(10, activation="softmax")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

model.summary()--OUTPUT--Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 784)] 0

dense (Dense) (None, 128) 100480

dense_1 (Dense) (None, 64) 8256

dense_2 (Dense) (None, 10) 650

=================================================================
Total params: 109,386
Trainable params: 109,386
Non-trainable params: 0
_________________________________________________________________

The Keras functional API offers a more flexible way to define models than the Sequential approach. It's especially useful for complex architectures. You explicitly define the data's path from input to output, which allows for more intricate designs like models with multiple inputs or outputs.

  • You start by creating an input node with keras.Input.
  • Each layer, such as layers.Dense, is then called on the output of the previous layer.
  • Finally, you create the model by specifying its inputs and outputs with keras.Model.

The model.summary() function then provides a clear, tabular view of your network's structure and parameters.

Advanced neural network architectures

Building on those foundational frameworks, you can construct more specialized architectures like CNNs, RNNs, and Transformers to handle specific types of data and tasks.

Implementing a convolutional neural network (CNN)

import tensorflow as tf

cnn_model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D((2,2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])

cnn_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])--OUTPUT--# No output until the model is trained

Convolutional Neural Networks (CNNs) are ideal for image processing tasks. This model uses specialized layers to find visual patterns before the data is passed to standard Dense layers for classification. Understanding image processing in Python provides essential background for working with CNNs.

  • The Conv2D layer acts as a feature scanner, applying 32 different 3x3 filters to identify patterns like edges and textures in the input image.
  • MaxPooling2D then simplifies the output by downsampling, which helps the network focus on the most important features.
  • Finally, Flatten converts the 2D feature maps into a 1D vector that the final Dense layers can process.

Building a recurrent neural network (RNN) with LSTM

import tensorflow as tf

lstm_model = tf.keras.Sequential([
tf.keras.layers.Embedding(input_dim=10000, output_dim=64),
tf.keras.layers.LSTM(128, return_sequences=True),
tf.keras.layers.LSTM(64),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])

lstm_model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])--OUTPUT--# No output until the model is trained

Recurrent Neural Networks (RNNs) are designed for sequential data like text or time series. This model uses Long Short-Term Memory (LSTM) layers, a special type of RNN that’s excellent at remembering information over long sequences.

  • The Embedding layer converts integer-encoded words into dense vectors, which helps the model learn word relationships.
  • Two LSTM layers process the sequence. The first uses return_sequences=True to pass its full output to the second layer.
  • A final Dense layer with sigmoid activation outputs a probability, making this model ideal for binary classification tasks.

Creating a transformer-based neural network

import tensorflow as tf

class TransformerBlock(tf.keras.layers.Layer):
def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
super().__init__()
self.att = tf.keras.layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
self.ffn = tf.keras.Sequential([
tf.keras.layers.Dense(ff_dim, activation="relu"),
tf.keras.layers.Dense(embed_dim),
])
self.layernorm1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)
self.layernorm2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)
self.dropout1 = tf.keras.layers.Dropout(rate)
self.dropout2 = tf.keras.layers.Dropout(rate)--OUTPUT--# No output until the transformer block is used

Transformers are powerful models, especially for natural language processing. This code defines a TransformerBlock, the fundamental component of a transformer network. You create it as a custom layer by inheriting from tf.keras.layers.Layer, which allows for a reusable and modular design.

  • The core is the MultiHeadAttention layer, which helps the model weigh the importance of different words in a sequence.
  • A feed-forward network, defined as ffn, processes the attention output.
  • LayerNormalization and Dropout are included to stabilize training and prevent the model from overfitting.

Move faster with Replit

Replit is an AI-powered development platform that lets you skip setup and start coding instantly. It comes with all Python dependencies pre-installed, so you can focus on building instead of configuring environments.

This is where you can go from piecing together techniques to building complete applications with Agent 4. It takes your description of an app and handles the rest—writing the code, setting up databases, connecting APIs, and even deployment. Instead of just writing models, you can build a working product, like:

  • An image recognition tool that uses a CNN to identify objects in uploaded pictures.
  • A sentiment analysis app that processes customer reviews with an LSTM network to classify them as positive or negative.
  • A text summarizer that leverages a Transformer model to condense long articles into key bullet points.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Building a neural network often involves navigating a few common pitfalls, but they're usually straightforward to fix with the right know-how and code repair techniques.

Fixing input dimension errors when using the forward method

A frequent hurdle is the input dimension mismatch. This error typically pops up when your data's shape doesn't align with what the network's first layer expects during the forward pass. For example, your model might be configured for a 784-element vector, but you're feeding it a 28x28 matrix.

To fix this, you need to ensure your input data is correctly shaped. You can inspect its dimensions (e.g., with X.shape in NumPy) and use a reshaping or flattening function if it doesn't match the input_size or input_shape defined in your model. This simple check resolves one of the most common roadblocks.

Solving gradient vanishing with proper weight initialization

Vanishing gradients occur when the signals used for learning become too weak to update the network's early layers effectively, stalling the training process. This problem is often tied to how you initialize your network's weights. If the initial weights are too small, the gradients can shrink exponentially as they travel backward through the layers.

Proper weight initialization is the key. As shown in the NumPy example, starting with small random numbers (e.g., using np.random.randn(...) * 0.01) instead of zeros or large values helps keep the learning signal stable. Frameworks like TensorFlow and PyTorch often handle this with smart defaults, but understanding the principle is crucial for debugging.

Fixing loss calculation errors in neural networks

Errors in loss calculation usually stem from a mismatch between your model's final output and the format your loss function requires. The fix involves aligning three key components: your final layer's structure, its activation function, and the format of your training labels.

  • Output Layer: Ensure the number of neurons in your final Dense layer matches your task. For example, use 10 neurons for 10-class classification or a single neuron for binary classification.
  • Activation Function: The activation must fit the problem. Use softmax for multi-class classification to get a probability distribution, or sigmoid for a binary output between 0 and 1.
  • Label Format: Your target labels must match the loss function. For instance, sparse_categorical_crossentropy expects integer labels, while categorical_crossentropy requires one-hot encoded labels.

Fixing input dimension errors when using forward method

Fixing input dimension errors when using forward method

An input dimension mismatch is a classic "welcome to machine learning" error. It happens when your data's shape doesn't align with what the network's first layer expects. See how a simple 28x28 image breaks the forward pass in the code below.

import numpy as np

# Create a simple neural network
nn = SimpleNN(input_size=784, hidden_size=128, output_size=10)

# Create incorrect input (2D instead of flattened)
image = np.random.rand(28, 28) # 28x28 image

# This will cause a dimension error
prediction = nn.forward(image)

The np.dot operation inside the forward method fails because the input image is a (28, 28) matrix. The network was built for a flat 784-element array, so the matrix dimensions are incompatible. The following code shows the fix.

import numpy as np

# Create a simple neural network
nn = SimpleNN(input_size=784, hidden_size=128, output_size=10)

# Reshape the image to match the expected input size
image = np.random.rand(28, 28) # 28x28 image
flattened_image = image.reshape(1, 784) # Flatten to 1x784

# Now the prediction works correctly
prediction = nn.forward(flattened_image)

The solution is to flatten the input data. The code uses image.reshape(1, 784) to transform the 28x28 image into the 1D array the network was designed for. This alignment resolves the dimension mismatch, allowing the forward method to process the data correctly. You'll often encounter this step when preparing raw data, like images, for a network's initial dense layer, so it's a good practice to always check your input shapes.

Solving gradient vanishing with proper weight initialization

The vanishing gradient problem can silently kill your network's ability to learn. It happens when updates sent back through the layers become so small they're almost zero, effectively freezing the training process, especially in deeper networks.

This issue often stems from how you initialize the weights. The code below shows how seemingly harmless small weights, combined with certain activation functions like np.tanh, can trigger this problem.

import numpy as np

class DeepNN:
def __init__(self, input_size, hidden_size, output_size):
# Poor initialization with small values
self.W1 = np.random.randn(input_size, hidden_size) * 0.01
self.b1 = np.zeros((1, hidden_size))
self.W2 = np.random.randn(hidden_size, output_size) * 0.01
self.b2 = np.zeros((1, output_size))

def forward(self, X):
self.z1 = np.dot(X, self.W1) + self.b1
self.a1 = np.tanh(self.z1) # tanh activation can cause vanishing gradients
self.z2 = np.dot(self.a1, self.W2) + self.b2
return self.z2

The np.tanh activation, paired with weights initialized near zero using * 0.01, pushes outputs into a range where learning signals weaken. This effect compounds in deeper networks, stalling training. The following code demonstrates a more robust initialization method.

import numpy as np

class DeepNN:
def __init__(self, input_size, hidden_size, output_size):
# Xavier initialization for tanh activation
self.W1 = np.random.randn(input_size, hidden_size) * np.sqrt(1/input_size)
self.b1 = np.zeros((1, hidden_size))
self.W2 = np.random.randn(hidden_size, output_size) * np.sqrt(1/hidden_size)
self.b2 = np.zeros((1, output_size))

def forward(self, X):
self.z1 = np.dot(X, self.W1) + self.b1
self.a1 = np.tanh(self.z1)
self.z2 = np.dot(self.a1, self.W2) + self.b2
return self.z2

The fix is to use a smarter weight initialization strategy called Xavier initialization. Instead of a fixed small number, the weights are scaled based on the number of input neurons, using np.sqrt(1/input_size). This method helps maintain a healthy gradient flow, preventing the learning signal from fading. It's especially useful for networks with tanh or sigmoid activations. Keep an eye on this if your model's training performance flatlines unexpectedly.

Fixing loss calculation errors in neural networks

Errors in loss calculation often happen when your model's output doesn't match what the loss function expects. This is a classic mismatch between your final layer's activation, like softmax, and the chosen loss function, such as binary_crossentropy.

This misalignment can lead to confusing results or prevent your model from training correctly. The code below shows a common example where a multi-class model is incorrectly paired with a binary loss function, creating a conflict that needs to be resolved.

import tensorflow as tf

# Creating a model for multi-class classification
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
tf.keras.layers.Dense(10, activation='softmax') # 10 classes with softmax
])

# Using binary crossentropy for multi-class problem
model.compile(optimizer='adam',
loss='binary_crossentropy', # Wrong loss for multi-class
metrics=['accuracy'])

The model uses softmax for a 10-class problem, but the loss function is binary_crossentropy, which is meant for binary tasks. This fundamental mismatch causes the error. The following code demonstrates the correct configuration.

import tensorflow as tf

# Creating a model for multi-class classification
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
tf.keras.layers.Dense(10, activation='softmax') # 10 classes with softmax
])

# Using the correct loss function for multi-class classification
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy', # Correct loss function
metrics=['accuracy'])

The fix is to align the loss function with the model's output. Since the final layer uses softmax for 10 classes, it's a multi-class classification problem. The correct loss function is sparse_categorical_crossentropy, which is designed for this exact scenario, especially when your labels are integers.

The previous binary_crossentropy function was a mismatch because it’s built for binary tasks. Always check that your output layer, activation, and loss function are compatible to avoid training errors.

Real-world applications

Beyond the code and common errors, these networks power practical applications like image_classification and time_series_forecasting. With vibe coding, you can rapidly prototype these applications using natural language descriptions.

Using a neural network for image_classification

A simple network can classify an image by first flattening its pixel data and then using a softmax function to predict the most probable class.

import numpy as np
from tensorflow import keras

# Create a simple neural network for image classification
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)), # Flatten 28x28 images
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10, activation='softmax') # Output layer for 10 classes
])

# Generate a sample image (28x28 pixels)
sample_image = np.random.random((1, 28, 28))

# Predict the class of the image
predictions = model.predict(sample_image)
predicted_class = np.argmax(predictions[0])
confidence = predictions[0][predicted_class] * 100

print(f"Predicted digit: {predicted_class}")
print(f"Confidence: {confidence:.2f}%")

This code shows how a neural network classifies an image. A keras.Sequential model is built with three key layers:

  • A Flatten layer converts the 28x28 pixel image into a one-dimensional array.
  • A Dense layer with relu activation processes the flattened data.
  • The final Dense layer uses softmax to output probabilities across 10 classes.

After generating a random image, model.predict() gets the classification results. Finally, np.argmax() finds the most likely class by selecting the index with the highest probability score.

Using a neural network for time_series_forecasting

A network with LSTM layers can analyze a sequence of data points to forecast the next value in a time series.

import numpy as np
import tensorflow as tf

# Create a simple LSTM model for time series forecasting
model = tf.keras.Sequential([
tf.keras.layers.LSTM(50, return_sequences=True, input_shape=(10, 1)),
tf.keras.layers.LSTM(50),
tf.keras.layers.Dense(1)
])

# Generate sample time series data (10 time steps)
time_series = np.array([np.sin(i) for i in range(10)]).reshape(1, 10, 1)

# Forecast the next value
forecast = model.predict(time_series)
print(f"Last value in series: {float(time_series[0, -1, 0]):.4f}")
print(f"Forecasted next value: {float(forecast[0, 0]):.4f}")

This code demonstrates how a model forecasts the next value in a sequence. It uses a tf.keras.Sequential model with two stacked LSTM layers, which are designed to find patterns in sequential data like a time series.

  • The first LSTM layer is configured with return_sequences=True, allowing it to pass the entire processed sequence to the next layer for deeper analysis.
  • The second LSTM layer then distills this information into a final output.
  • A single-neuron Dense layer produces the final forecast.

The model predicts the next point in a sample sine wave using model.predict().

Get started with Replit

Turn your knowledge into a working tool. Describe what you want to build to Replit Agent, like “a web app that classifies handwritten digits” or “a dashboard that forecasts stock prices using an LSTM.”

It will write the code, test for errors, and deploy your application for you. Start building with Replit and see your project come together in minutes.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.