参考链接:多线程下载USGS Lidar Explorer Map点云数据 - 行走的蓑衣客 - 博客园 :
https://www.cnblogs.com/suoyike1001/p/16998574.html
数据获取
1.进入数据下载网站:
USGS Lidar Explorer Map :https://apps.nationalmap.gov/lidar-explorer/#/
(实测在国内不科学上网可以正常进行检索,但加载上可能会有点卡顿)
2.选择DEM数据

3.进行DEM数据检索

4.观察检索结果,下载txt下载集合


5.处理观察结果集合,选择可以完成覆盖的且尽量一致的数据,比如2018就都是2018年的,
新建一个excel表格来存放链接数据:
第一行为表头,数据从第二行开始
将需要下载的数据填写完整
名称那里可以选择最后字段(利用ctrl+E快捷键)
下载数据
1.copy代码,安装import对应的xlrd等库(pip install xlrd或pip install xlrd -i https://pypi.tuna.tsinghua.edu.cn/simple/)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
| import xlrd import requests import logging import threading import time import random a = xlrd.open_workbook('20240202.xls', 'r') sht = a.sheets()[0]
start = 1
nrows = sht.nrows;
def fetch(url, filename): r = requests.get(url) url2 = url[-3:] dir = r"F:\MAP\jihe0202\\" + filename + "." + url2 with open(dir, "wb") as code: code.write(r.content) print(url) jindu = (i - start) / (nrows - start) * 100 print("下载进度:", jindu, "%")
t1 = time.time() t_list = []
for i in range(start, nrows): user_agent_list = [ "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36", "Mozilla/5.0 (Windows NT 10.0; WOW64) Gecko/20100101 Firefox/61.0", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)", "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.5; en-US; rv:1.9.2.15) Gecko/20110303 Firefox/3.6.15", ] headers = {'User-Agent': random.choice(user_agent_list)}
url = sht.cell(i, 2).value if url: logging.info(url) roadName = sht.cell(i, 1).value filename = sht.cell(i, 1).value t = threading.Thread(target=fetch, args=(url, filename)) t_list.append(t) t.start()
for t in t_list: t.join() print("多线程版爬虫耗时:", time.time() - t1)
|
2.修改excel文件路径
1
| a = xlrd.open_workbook('20240202.xls', 'r')
|
3.选择下载到的位置:
1
| dir = r"F:\MAP\jihe0202\\" + filename + "." + url2
|
4.设定好后即可运行
只有没报错一般就是开始运行了,等一段时间,弹出耗时:xx时即表示下载完成;