A parallel Aho-Corasick (AC) approach, named PAC-k, is proposed for string matching in deep packet inspection (DPI). The proposed approach adopts graphic processing units (GPUs) to perform the string matching in parallel for high throughput. In parallel string matching, the boundary detection problem happens when a pattern is matched across chunks. The PAC-k approach solves the boundary detection problem because the number of characters to be scanned by a thread can reach the longest pattern length. An input string is divided into multiple sub-chunks with k characters. By adopting the new starting position in each sub-chunk for the failure transition, the required number of threads is reduced by a factor of k. Therefore, the overhead of terminating and reassigning threads is also decreased. In order to avoid the unnecessary overlapped scanning with multiple threads, a checking procedure is proposed that decides whether a new starting position is in the sub-chunk. In the experiments with target patterns from Snort and realistic input strings from DEFCON, throughputs are enhanced greatly compared to those of previous AC-based string matching approaches.
ThienLuan HO
Dankook University
Seung-Rohk OH
Dankook University
HyunJin KIM
Dankook University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
ThienLuan HO, Seung-Rohk OH, HyunJin KIM, "PAC-k: A Parallel Aho-Corasick String Matching Approach on Graphic Processing Units Using Non-Overlapped Threads" in IEICE TRANSACTIONS on Communications,
vol. E99-B, no. 7, pp. 1523-1531, July 2016, doi: 10.1587/transcom.2015EBP3411.
Abstract: A parallel Aho-Corasick (AC) approach, named PAC-k, is proposed for string matching in deep packet inspection (DPI). The proposed approach adopts graphic processing units (GPUs) to perform the string matching in parallel for high throughput. In parallel string matching, the boundary detection problem happens when a pattern is matched across chunks. The PAC-k approach solves the boundary detection problem because the number of characters to be scanned by a thread can reach the longest pattern length. An input string is divided into multiple sub-chunks with k characters. By adopting the new starting position in each sub-chunk for the failure transition, the required number of threads is reduced by a factor of k. Therefore, the overhead of terminating and reassigning threads is also decreased. In order to avoid the unnecessary overlapped scanning with multiple threads, a checking procedure is proposed that decides whether a new starting position is in the sub-chunk. In the experiments with target patterns from Snort and realistic input strings from DEFCON, throughputs are enhanced greatly compared to those of previous AC-based string matching approaches.
URL: https://globals.ieice.org/en_transactions/communications/10.1587/transcom.2015EBP3411/_p
Copy
@ARTICLE{e99-b_7_1523,
author={ThienLuan HO, Seung-Rohk OH, HyunJin KIM, },
journal={IEICE TRANSACTIONS on Communications},
title={PAC-k: A Parallel Aho-Corasick String Matching Approach on Graphic Processing Units Using Non-Overlapped Threads},
year={2016},
volume={E99-B},
number={7},
pages={1523-1531},
abstract={A parallel Aho-Corasick (AC) approach, named PAC-k, is proposed for string matching in deep packet inspection (DPI). The proposed approach adopts graphic processing units (GPUs) to perform the string matching in parallel for high throughput. In parallel string matching, the boundary detection problem happens when a pattern is matched across chunks. The PAC-k approach solves the boundary detection problem because the number of characters to be scanned by a thread can reach the longest pattern length. An input string is divided into multiple sub-chunks with k characters. By adopting the new starting position in each sub-chunk for the failure transition, the required number of threads is reduced by a factor of k. Therefore, the overhead of terminating and reassigning threads is also decreased. In order to avoid the unnecessary overlapped scanning with multiple threads, a checking procedure is proposed that decides whether a new starting position is in the sub-chunk. In the experiments with target patterns from Snort and realistic input strings from DEFCON, throughputs are enhanced greatly compared to those of previous AC-based string matching approaches.},
keywords={},
doi={10.1587/transcom.2015EBP3411},
ISSN={1745-1345},
month={July},}
Copy
TY - JOUR
TI - PAC-k: A Parallel Aho-Corasick String Matching Approach on Graphic Processing Units Using Non-Overlapped Threads
T2 - IEICE TRANSACTIONS on Communications
SP - 1523
EP - 1531
AU - ThienLuan HO
AU - Seung-Rohk OH
AU - HyunJin KIM
PY - 2016
DO - 10.1587/transcom.2015EBP3411
JO - IEICE TRANSACTIONS on Communications
SN - 1745-1345
VL - E99-B
IS - 7
JA - IEICE TRANSACTIONS on Communications
Y1 - July 2016
AB - A parallel Aho-Corasick (AC) approach, named PAC-k, is proposed for string matching in deep packet inspection (DPI). The proposed approach adopts graphic processing units (GPUs) to perform the string matching in parallel for high throughput. In parallel string matching, the boundary detection problem happens when a pattern is matched across chunks. The PAC-k approach solves the boundary detection problem because the number of characters to be scanned by a thread can reach the longest pattern length. An input string is divided into multiple sub-chunks with k characters. By adopting the new starting position in each sub-chunk for the failure transition, the required number of threads is reduced by a factor of k. Therefore, the overhead of terminating and reassigning threads is also decreased. In order to avoid the unnecessary overlapped scanning with multiple threads, a checking procedure is proposed that decides whether a new starting position is in the sub-chunk. In the experiments with target patterns from Snort and realistic input strings from DEFCON, throughputs are enhanced greatly compared to those of previous AC-based string matching approaches.
ER -