Selected Research

======

Accelerating Nearest Neighbor Search in 3D Point Cloud Registration on GPUs
Qiong Chang, Weimin Wang, Jun Miyazaki
ACM Transactions on Architecture and Code Optimization, 2025 [bib|DOI|code]

PCL: 1x Open3D: 4x Ours: 12x
Proposed a GPU-accelerated method to significantly speed up nearest neighbor search for 3D point cloud registration, enhancing real-time performance in high-density spatial data processing.
Efficient Parallel Implementation of Non-Local Means Algorithm on GPU
Xiang Li, Qiong Chang*, Yun Li and Jun Miyazaki
17th Workshop on General Purpose Processing Using GPU (GPGPU2025), 2025 [bib|DOI]

Input OPENCV-GPU: 1x Ours: 5.5x
Proposed an efficient parallel implementation of the 3D Non-Local Means (NLM) denoising algorithm on GPU, significantly accelerating performance for high-resolution medical image processing tasks.
An Optimized GPU Implementation for GIST Descriptor
Xiang Li, Qiong Chang*, Aolong Zha, Shijie Chang, Yun Li, Jun Miyazaki
ACM Transactions on Architecture and Code Optimization, 2024 [bib|DOI]

Input cuFFT: 1x Ours: 6.4x
Introduced an optimized GPU-based implementation of the GIST descriptor, significantly accelerating image feature extraction for large-scale visual processing tasks.
Multi-Directional Sobel Operator Kernel on Gpus
Qiong Chang, Xiang Li, Yun Li, Jun Miyazaki
Journal of Parallel and Distributed Computing, 2023 [bib|DOI]

Input OPENCV-GPU: 1x Ours: 11x
Proposed a GPU-accelerated multi-directional Sobel operator kernel for efficient and parallel edge detection across multiple gradient orientations.
TinyStereo: A Tiny Coarse-to-Fine Framework for Vision-based Depth Estimation on Embedded GPUs
Qiong Chang, Xin Xu, Aolong Zha, Yongqing Sun, Yun Li
IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024 [bib|DOI|code]
Input
z2zncc(28fps on Jetson Tx2)
sRRNet(22fps on Jetson Tx2)
Implemented a lightweight coarse-to-fine stereo matching framework optimized for embedded GPUs, enabling efficient and accurate depth estimation under constrained computational resources.
Efficient Stereo Matching on Embedded GPUs with Zero-Means Cross Correlation
Qiong Chang, Aolong Zha, Weimin Wang, Xin Liu, Masaki Onishi, Lei Lei, Tsutomu Maruyama
Journal of Systems Architecture, 2022 [bib|DOI|code]
Left (original ZNCC): 10fps, Right (proposed Z2ZNCC): 20fps
Implemented fast ZNCC feature matching on embedded GPUs, offering an effective real-time alternative to traditional Census in stereo matching.

Qiong Chang