xpath常用函数用法

Huginn需要xpath定位链接和标题，做个笔记备忘。

一、精确定位
1）`contains(str1,str2)`  用来判断`str1`是否包含`str2`
例1：`//*[contains(@class,'c-summary c-row ')]` 选择`@class`值中包含`c-summary` `c-row`的节点
例2：`//div[contains(.//text(),'价格')]`选择`text()`中包含价格的`div`节点

2）`position()`  选择当前的第几个节点
例1：`//*[@class='result'][position()=1]`选择`@class='result'`的第一个节点
例2：`//*[@class='result'][position()<=2]`选择`@class='result'`的前两个节点

3）`last()`  选择当前的倒数第几个节点
例1：`//*[@class='result'][last()]`选择`@class='result'`的最后一个节点
例2：`//*[@class='result'][last()-1]`选择`@class='result'`的倒数第二个节点

4）`following-sibling`选取当前节点之后的所有同级节点
例1：`//div[@class='result']/following-sibling::div`   选择`@class='result'`的`div`节点后所有同级`div`节点     找到多个节点时可通过`position`确定第几个如：`//div[@class='result']/following-sibling::div[position()=1]`

5）`preceding-sibling` 选取当前节点之前的所有同级节点
使用方法同`following-sibling`

二、过滤信息
1）`substring-before(str1,str2)`  用于返回字符串`str1`中位于第一个`str2`之前的部分
例子：`substring-before(.//*[@class='c-more_link']/text(),'条')`
返回`.//*[@class='c-more_link']/text()`中第一个`'条'`前面的部分，如果不存在`'条'`，则返回空值

2）`substring-after(str1,str2)`  跟`substring-before`类似，返回字符串`str1`中位于第一个`str2`之后的部分
例1：`substring-after(.//*[@class='c-more_link']/text(),'条')`
返回`.//*[@class='c-more_link']/text()`中第一个`'条'`后面的部分，如果不存在`'条'`，则返回空值
例2：`substring-after(substring-before(.//*[@class='c-more_link']/text(),'新闻'),'条')`
返回`.//*[@class='c-more_link']/text()`中第一个`'新闻'`前面与第一个`'条'`后面之间的部分

3）`normalize-space()`
用来将一个字符串的头部和尾部的空白字符删除，如果字符串中间含有多个连续的空白字符，将用一个空格来代替
例子：`normalize-space(.//*[contains(@class,'c-summaryc-row ')])`

4）`translate(string,str1,str2)`
假如`string`中的字符在`str1`中有出现，那么替换为`str1`对应`str2`的同一位置的字符，假如`str2`这个位置取不到字符则删除`string`的该字符
例子：`translate('12:30','03','54')   结果：'12:45'`

三、拼接信息

1）`concat()`函数用于串连多个字符串
例子：`concat('http://baidu.com',.//*[@class='c-more_link']/@href)`

参考链接：[http://www.gooseeker.com/doc/thread-1852-1-1.html](http://www.gooseeker.com/doc/thread-1852-1-1.html "http://www.gooseeker.com/doc/thread-1852-1-1.html")

xpath常用函数用法

评论已关闭

最新文章

标签