Model
Model Architecture:
A PyTorch implementation of a generic CNN model for semantic segmentation of
pointclouds, using MinkowskiEngine for its sparse tensor computations.
1. conv1: MinkowskiConvolution layer with 9 input channels and 32 output channels,
kernel size of 5, stride of 1, and dilation of 1.
2. bn1: Minkowski batch normalization layer with 32 channels.
3. conv2: MinkowskiConvolution layer with 32 input channels and 64 output channels,
kernel size of 3, stride of 2, and dilation of 1.
4. bn2: Minkowski batch normalization layer with 64 channels.
5. conv3: MinkowskiConvolution layer with 64 input channels and 25 output channels,
kernel size of 3, stride of 2, and dilation of 1.
6. pool: MinkowskiGlobalMaxPooling layer with 25 input channels.
7. fc1: Fully connected linear layer with 25 input features and 64 output features.
8. ReLU activation function.
9. fc2: Fully connected linear layer with 64 input features
and `num_classes` output features.
10. Static learning rate scheduler.
11. Labels are one-hot encoded.
12. Loss function is binary_cross_entropy_with_logits.
In summary, the model is a 3D CNN tailored for sparse tensor computations,
employing MinkowskiEngine and PyTorch Lightning for training and evaluation.
The architecture consists of three Minkowski convolutional layers,
two batch normalization layers, a global max pooling layer,
and two fully connected layers.
The model is optimized using SGD with a step-based learning rate scheduler.
Last updated