{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# -*- coding: utf-8 -*-\n",
"\"\"\"\n",
"Created on Tue Jan 12 20:11:10 2021"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"@author: Usuario\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" MechanicalSoup ##"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" pip install MechanicalSoup"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting MechanicalSoup\n",
" Downloading MechanicalSoup-1.1.0-py3-none-any.whl (19 kB)\n",
"Requirement already satisfied: lxml in /home/mydoctor/anaconda3/lib/python3.8/site-packages (from MechanicalSoup) (4.6.3)\n",
"Requirement already satisfied: beautifulsoup4>=4.7 in /home/mydoctor/anaconda3/lib/python3.8/site-packages (from MechanicalSoup) (4.10.0)\n",
"Requirement already satisfied: requests>=2.22.0 in /home/mydoctor/anaconda3/lib/python3.8/site-packages (from MechanicalSoup) (2.26.0)\n",
"Requirement already satisfied: soupsieve>1.2 in /home/mydoctor/anaconda3/lib/python3.8/site-packages (from beautifulsoup4>=4.7->MechanicalSoup) (2.2.1)\n",
"Requirement already satisfied: idna<4,>=2.5 in /home/mydoctor/anaconda3/lib/python3.8/site-packages (from requests>=2.22.0->MechanicalSoup) (3.2)\n",
"Requirement already satisfied: charset-normalizer~=2.0.0 in /home/mydoctor/anaconda3/lib/python3.8/site-packages (from requests>=2.22.0->MechanicalSoup) (2.0.4)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/mydoctor/anaconda3/lib/python3.8/site-packages (from requests>=2.22.0->MechanicalSoup) (1.26.7)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /home/mydoctor/anaconda3/lib/python3.8/site-packages (from requests>=2.22.0->MechanicalSoup) (2021.10.8)\n",
"Installing collected packages: MechanicalSoup\n",
"Successfully installed MechanicalSoup-1.1.0\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"pip install MechanicalSoup"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import mechanicalsoup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Creamos un objeto Browser"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"browser = mechanicalsoup.Browser()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Podemos solicitar una pagina de internet "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"url = \"http://olympus.realpython.org/login\"\n",
"page = browser.get(url)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Obtenemos un objeto que almacena la respuesta de la URL solicitada"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"200"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"page.status_code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"
\n",
"En este caso el numero 200 representa el codigo de estado devuelto por la solicitud.
\n",
" Significa que la solicitud se ha realizado correctamente
\n",
"Otros codigos habituales:
\n",
" 404: La URL no existe
\n",
" 500: Ha ocurrido un error en el servidor al realizar la solicitud
\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"MechanicalSoup puede usar la libreria BeautifulSoup para analizar el HTML obtenido "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"bs4.BeautifulSoup"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(page.soup)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Podemos ver el HTML mediante el atributo .soup"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"