Selected Research

======

  • Accelerating Nearest Neighbor Search in 3D Point Cloud Registration on GPUs
    Qiong Chang, Weimin Wang, Jun Miyazaki
    ACM Transactions on Architecture and Code Optimization, 2025 [bib|DOI|code]
    IconIconIcon
       PCL: 1x      Open3D: 4x     Ours: 12x
    Proposed a GPU-accelerated method to significantly speed up nearest neighbor search for 3D point cloud registration, enhancing real-time performance in high-density spatial data processing.

  • Efficient Parallel Implementation of Non-Local Means Algorithm on GPU
    Xiang Li, Qiong Chang*, Yun Li and Jun Miyazaki
    17th Workshop on General Purpose Processing Using GPU (GPGPU2025), 2025 [bib|DOI]
    IconIconIcon
       Input      OPENCV-GPU: 1x    Ours: 5.5x
    Proposed an efficient parallel implementation of the 3D Non-Local Means (NLM) denoising algorithm on GPU, significantly accelerating performance for high-resolution medical image processing tasks.

  • An Optimized GPU Implementation for GIST Descriptor
    Xiang Li, Qiong Chang*, Aolong Zha, Shijie Chang, Yun Li, Jun Miyazaki
    ACM Transactions on Architecture and Code Optimization, 2024 [bib|DOI]
    IconIconIcon
       Input       cuFFT: 1x     Ours: 6.4x
    Introduced an optimized GPU-based implementation of the GIST descriptor, significantly accelerating image feature extraction for large-scale visual processing tasks.

  • Multi-Directional Sobel Operator Kernel on Gpus
    Qiong Chang, Xiang Li, Yun Li, Jun Miyazaki
    Journal of Parallel and Distributed Computing, 2023 [bib|DOI]
    IconIconIcon
       Input      OPENCV-GPU: 1x    Ours: 11x
    Proposed a GPU-accelerated multi-directional Sobel operator kernel for efficient and parallel edge detection across multiple gradient orientations.

  • TinyStereo: A Tiny Coarse-to-Fine Framework for Vision-based Depth Estimation on Embedded GPUs
    Qiong Chang, Xin Xu, Aolong Zha, Yongqing Sun, Yun Li
    IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024 [bib|DOI|code]
    Icon Input
    Icon z2zncc(28fps on Jetson Tx2)
    Icon sRRNet(22fps on Jetson Tx2)
    Implemented a lightweight coarse-to-fine stereo matching framework optimized for embedded GPUs, enabling efficient and accurate depth estimation under constrained computational resources.

  • Efficient Stereo Matching on Embedded GPUs with Zero-Means Cross Correlation
    Qiong Chang, Aolong Zha, Weimin Wang, Xin Liu, Masaki Onishi, Lei Lei, Tsutomu Maruyama
    Journal of Systems Architecture, 2022 [bib|DOI|code] Icon
    Left (original ZNCC): 10fps, Right (proposed Z2ZNCC): 20fps
    Implemented fast ZNCC feature matching on embedded GPUs, offering an effective real-time alternative to traditional Census in stereo matching.