[Python Debugging] Beautiful Soup Error - AttributeError: 'NoneType' object has no attribute 'text'
Error
I was writing a code to crawl a website, and AttributeError: 'NoneType' object has no attribute 'text'
occured.
My Code:
import requests
from bs4 import BeautifulSoup
param = {
'isDetailSearch': 'N',
'searchGubun': 'true',
'viewYn': 'OP',
'order': '/DESC',
'onHanja': 'false',
'strSort': 'RANK',
'iStartCount': 0,
'fsearchMethod': 'search',
'sflag': 1,
'isFDetailSearch': 'N',
'pageNumber': 1,
'icate': 're_a_kor',
'colName': 're_a_kor',
'pageScale': 10,
'isTab': 'Y'
}
response = requests.get("https://www.riss.kr/search/Search.do", params=param)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
articles = soup.select('.srchResultListW > ul > li')
for article in articles:
title = article.select_one('.title > a').text
link = "https://www.riss.kr" + article.select_one('.title > a').attrs.get('href')
response = requests.get(link)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
press = soup.select_one('.infoDetailL > ul > li:nth-of-type(2) > div').text
Error Message:
The message shows that 38th line, which is press = soup.select_one()
got some problems.
Solution
The AttributeError
occured because div
does not exist in soup
variable. So, why does soup not have div? It’s because the web server refused my request.
There are three cases that the server refuses the request.
Status code 200 → BUT, returned HTML is not what we want
Status code 4xx
Infinite loop
To solve this issue, we can add a header, such as User-Agent or Referer.
User-Agent is to trick the website that this request is from the web browser.
Referer tells the url that sent the request.
You can get the header by:
Open a devtools in your web browser
Go to the network tab.
Scroll to the top, and select the first.
Select “Headers”
- Now you can check the User-Agent & Referer.
Solution Code:
import requests
from bs4 import BeautifulSoup
param = {
'isDetailSearch': 'N',
'searchGubun': 'true',
'viewYn': 'OP',
'order': '/DESC',
'onHanja': 'false',
'strSort': 'RANK',
'iStartCount': 0,
'fsearchMethod': 'search',
'sflag': 1,
'isFDetailSearch': 'N',
'pageNumber': 1,
'icate': 're_a_kor',
'colName': 're_a_kor',
'pageScale': 10,
'isTab': 'Y',
}
response = requests.get("https://www.riss.kr/search/Search.do", params=param)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
articles = soup.select('.srchResultListW > ul > li')
# HEADER
header = {
'User-Agent': 'Mozilla/5.0',
'Referer': 'https://www.riss.kr/search/Search.do?isDetailSearch=N&searchGubun=true&viewYn=OP&queryText=&strQuery=%ED%8C%A8%EC%85%98+%EC%9D%B8%EA%B3%B5%EC%A7%80%EB%8A%A5&exQuery=&exQueryText=&order=%2FDESC&onHanja=false&strSort=RANK&p_year1=&p_year2=&iStartCount=0&orderBy=&mat_type=&mat_subtype=&fulltext_kind=&t_gubun=&learning_type=&ccl_code=&inside_outside=&fric_yn=&db_type=&image_yn=&gubun=&kdc=&ttsUseYn=&l_sub_code=&fsearchMethod=search&sflag=1&isFDetailSearch=N&pageNumber=1&resultKeyword=&fsearchSort=&fsearchOrder=&limiterList=&limiterListText=&facetList=&facetListText=&fsearchDB=&icate=re_a_kor&colName=re_a_kor&pageScale=100&isTab=Y®nm=&dorg_storage=&language=&language_code=&clickKeyword=&relationKeyword=&query=%ED%8C%A8%EC%85%98+%EC%9D%B8%EA%B3%B5%EC%A7%80%EB%8A%A5'
}
for article in articles:
title = article.select_one('.title > a').text
link = "https://www.riss.kr" + article.select_one('.title > a').attrs.get('href')
# Set headers=header
response = requests.get(link, headers=header)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
press = soup.select_one('.infoDetailL > ul > li:nth-of-type(2) > div').text
I added a header
including ‘User-Agent’ and ‘Referer‘, and set response = requests.get(link, headers=header)