Model

Model Architecture:
A PyTorch implementation of a generic CNN model for semantic segmentation of 
 pointclouds, using MinkowskiEngine for its sparse tensor computations.


1. conv1: MinkowskiConvolution layer with 9 input channels and 32 output channels, 
    kernel size of 5, stride of 1, and dilation of 1.
    
2. bn1: Minkowski batch normalization layer with 32 channels.

3. conv2: MinkowskiConvolution layer with 32 input channels and 64 output channels,
    kernel size of 3, stride of 2, and dilation of 1.
    
4. bn2: Minkowski batch normalization layer with 64 channels.

5. conv3: MinkowskiConvolution layer with 64 input channels and 25 output channels,
    kernel size of 3, stride of 2, and dilation of 1.
    
6. pool: MinkowskiGlobalMaxPooling layer with 25 input channels.

7. fc1: Fully connected linear layer with 25 input features and 64 output features.

8. ReLU activation function.

9. fc2: Fully connected linear layer with 64 input features 
    and `num_classes` output features.
    
10. Static learning rate scheduler.

11. Labels are one-hot encoded.
12. Loss function is binary_cross_entropy_with_logits.


In summary, the model is a 3D CNN tailored for sparse tensor computations, 
 employing MinkowskiEngine and PyTorch Lightning for training and evaluation.

The architecture consists of three Minkowski convolutional layers, 
 two batch normalization layers, a global max pooling layer, 
 and two fully connected layers.

The model is optimized using SGD with a step-based learning rate scheduler.



Last updated