Java解析网页内容简单实例

发表于2015-06-17 15:37:24| --次阅读| 来源webkfa| 作者Java哥

摘要：Java解析网页内容简单实例

前置
先下载Java解析网页内容简单实例Jar包
百度云jsoup-1.8.2.jar下载
http://pan.baidu.com/s/1hq1r5pM
实例

		java代码
	

package com.webkfa.test;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

/**
 * web开发技术提供
 * 网址:
 * http://www.webkfa.com
 */
   
public class Test {
	/**
	 * Java解析网页内容简单实例
	 * @param args
	 * @throws IOException
	 */
    public static void main(String[] args) throws IOException {
    	String url="http://www.webkfa.com";
    	String html=getHtmlContext(url).toString();
    	System.out.println(html);
    	Document doc =Jsoup.parse(html);
    	Elements links =doc.getElementsByTag("a");
    	for (Element link : links) { 
    	     String linkHref = link.attr("href"); 
    	     String linkText = link.text();
    	     System.out.println(linkHref);
    	     System.out.println(linkText);
    	}
    }
    /**
	 * 得到http内容
	 * @param hpath
	 * @return
	 * @throws IOException
	 */
	public static StringBuffer getHtmlContext(String hpath) throws IOException{
		StringBuffer bf=new StringBuffer();
		HttpURLConnection httpUrl = null;
		URL uobj = new URL(hpath);
		httpUrl = (HttpURLConnection) uobj.openConnection();
		httpUrl.connect();
		InputStream is = httpUrl.getInputStream();
		InputStreamReader isr = new InputStreamReader(is,"utf-8");
		BufferedReader br = new BufferedReader(isr);
		String line = null;
		while( (line = br.readLine()) != null ){
			bf.append(line+"\r\n");
		}
		br.close();
		isr.close();
		is.close();
		
		return bf;
	}
}

说明文档
Elements这个对象提供了一系列类似于DOM的方法来查找元素，抽取并处理其中的数据。具体如下：
查找元素
getElementById(String id)
getElementsByTag(String tag)
getElementsByClass(String className)
getElementsByAttribute(String key) (and related methods)
Element siblings: siblingElements(), firstElementSibling(), lastElementSibling(); nextElementSibling(), previousElementSibling()
Graph: parent(), children(), child(int index)
元素数据
attr(String key)获取属性attr(String key, String value)设置属性
attributes()获取所有属性
id(), className() and classNames()
text()获取文本内容text(String value) 设置文本内容
html()获取元素内HTMLhtml(String value)设置元素内的HTML内容
outerHtml()获取元素外HTML内容
data()获取数据内容（例如：script和style标签)
tag() and tagName()
操作HTML和文本
append(String html), prepend(String html)
appendText(String text), prependText(String text)
appendElement(String tagName), prependElement(String tagName)
html(String value)
官方文档:http://jsoup.org/apidocs/

Java解析网页内容简单实例

发表于2015-06-17 15:37:24| --次阅读| 来源webkfa| 作者Java哥

相关文章

最新发布

阅读排行

热门文章

猜你喜欢