GAS Library | Html parse for web scraping

How to add

You can add this library by the key below. (for legacy script editor)

MJqa3Uidm9a8fNR_0snRPwKWZ8rqdjnSl

if you want to know how to use this key , please check  this guidance.

How to use

1. Get html text by UrlFetchApp, PhantomJsCloud (if require script running to get html) or other tools.

2. make Html class by the html text like below.
(insert html text string to argument.)

var html = Html.parse({htmlText}); //get Html class
console.log(html.tree()); //check the structure of the html and get XPath of each Element. 

3. use Html class methods and get Element you want to reach.
(insert XPath string to argument. for example: ‘/body/div/header[1]/div’)

var elm = html.getElmX({XPath}); //get Element class

4. use Element class methods and get detail of this Element.

var tagName = elm.tag(); //get tag tagName
var attObj = elm.att(); //get attribution object.
var innerText = elm.innerText(); //get inner text (Not including inner html)

Class Html

Access all Elements included in this Html.

[table id=6 /]

Class Element

Get detail of this Element and access all Elements locate directly under this Element.

[table id=7 /]

Leave a Reply

Your email address will not be published. Required fields are marked *