论文首页哲学论文经济论文法学论文教育论文文学论文历史论文理学论文工学论文医学论文管理论文艺术论文 |
摘 要
搜索引擎是从WWW上快速而有效地获取信息资源的捷径,而网络蜘蛛技术则是搜索引擎的关键。本文围绕蜘蛛机器人的设计与实现,并结合搜索引擎框架的总体要求,实现了网络蜘蛛在互联网中的漫游,并将网页数据抓取存储在本地数据库中,为搜索引擎的实现打下了良好的基础。
本文首先对搜索引擎研究的概述和发展状况进行了简单的介绍,并详细地介绍和分析了基于Web挖掘的搜索引擎技术,然后详细分析了网络蜘蛛技术实现的功能和搜索策略。
研究内容主要包括:分析搜索引擎的工作原理和相关的搜索技术,实现网络蜘蛛在互联网上抓取Web页面,并对此设计了详细的Web抓取模块、页面存储模块和数据模块设计。结合现有的网络蜘蛛搜索策略并用队列设计出简单而又有效的搜索算法,这就使网络蜘蛛的实现更加容易。
最后,对本课题下1步的主要工作内容进行系统的总结并做出简单的展望。
关键词:WWW;Web挖掘;网络蜘蛛;搜索策略;Web分类技术
ABSTRACT
The search engine gains the information resource rapidly and effectively from WWW in shortcut, but the network spider technology is the critical of search engine. This article focuses on the design and implementation of the spider robot and encompasses the general request of the search engine frame to realize network spider’s surfing in the internet. It also builds a good foundation for the implementation of the search engine through capturing and saving the web page data into the local database.
This article first introduce easily the outline of study and the development of the search engine, and introduces in detail and analysis search engine technology on basis of Web, then analysis the implementation of network spider technology and search strategy in details.
The research content mainly includes: Analysis the search engine principle of work and the correlation search technology, the implementation of network spider captures the Web page on the Internet, regarding this and design Web capture module in details, the module of page saving and the module of database design. The design applies the existing network spider search strategy and designs simple and the effective search algorithm with the formation, which makes the network spiders realization to be easier.
Finally, carries on the system to this topic next step prime task content the summary and makes the simple forecast.
Keywords: WWW; Web excavation; Network spider; Search strategy; Web classification technology
注释:不含源代码