Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.20532/cit.2020.1004899

A Detection Method for Phishing Web Page Using DOM-Based Doc2Vec Model

Jian Feng ; Xi'an University of Science and Technology, Xi'an, China
Ying Zhang ; Information Technology Department for Head Office of SPD Bank, Xi'an, China
Yuqiang Qiao ; Xi'an University of Science and Technology, Xi'an, China


Puni tekst: engleski pdf 1.041 Kb

str. 19-31

preuzimanja: 664

citiraj


Sažetak

Detecting phishing web pages is a challenging task. The existing detection method for phishing web page based on DOM (Document Object Model) is mainly aiming at obtaining structural characteristics but ignores the overall representation of web pages and the semantic information that HTML tags may have. This paper regards DOMs as a natural language with Doc2Vec model and learns the structural semantics automatically to detect phishing web pages. Firstly, the DOM structure of the obtained web page is parsed to construct the DOM tree, then the Doc2Vec model is used to vectorize the DOM tree, and to measure the semantic similarity in web pages by the distance between different DOM vectors. Finally, the hierarchical clustering method is used to implement clustering of web pages. Experiments show that the method proposed in the paper achieves higher recall and precision for phishing classification, compared to DOM-based structural clustering method and TF-IDF-based semantic clustering method. The result shows that using Paragraph Vector is effective on DOM in a linguistic approach.

Ključne riječi

phishing detection, semantic similarity, Doc2Vec, DOM, clustering

Hrčak ID:

240969

URI

https://hrcak.srce.hr/240969

Datum izdavanja:

10.7.2020.

Posjeta: 1.249 *