{ "cells": [ { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# -*- coding: utf-8 -*-\n", "\"\"\"\n", "Created on Thu Jan 14 11:28:44 2021\n", "@author: Usuario\n", " \n", "## Interactuar con paginas web en tiempo real ##\n", " \n", " \n", "# Se quiere scrapear una web en la que se simula la tirada de un dado \"\"\"\n", "import mechanicalsoup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "En primer lugar hay que determinar que elemento de la pagina
\n", "contiene el resultado de la tirada (Codigo fuente de la web)
\n", "

1

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Creamos el objeto Browser" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "browser = mechanicalsoup.Browser()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Solicitud de la URL deseada" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "page = browser.get(\"http://olympus.realpython.org/dice\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Buscamos el elemento con id=result.
\n", "En este caso utilizamos el selector de ID de CSS (#) para indicar que
\n", "lo que queremos es el valor del id." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "tag = page.soup.select(\"#result\")[0]" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "result = tag.text" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The result of your dice roll is: 3\n" ] } ], "source": [ "print(f\"The result of your dice roll is: {result}\") " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Para obtener resultados continuamente hay que crear un bucle
\n", "que cargue la pagina en cada caso. AƱadimos un tiempo de espera entre tiradas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ejemplo modulo .sleep()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"I'm about to wait for five seconds...\")\n", "time.sleep(5)\n", "print(\"Done waiting!\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time\n", "import mechanicalsoup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "browser = mechanicalsoup.Browser()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i in range(4):\n", " page = browser.get(\"http://olympus.realpython.org/dice\")\n", " tag = page.soup.select(\"#result\")[0]\n", " result = tag.text\n", " print(f\"The result of your dice roll is: {result}\")\n", " time.sleep(10) # Tiempo espera\n", " \n", "# Optimizamos el codigo para no tener que esperar 10seg en la tirada 4 \n", " \n", "import time\n", "import mechanicalsoup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "browser = mechanicalsoup.Browser()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i in range(4):\n", " page = browser.get(\"http://olympus.realpython.org/dice\")\n", " tag = page.soup.select(\"#result\")[0]\n", " result = tag.text\n", " print(f\"The result of your dice roll is: {result}\")\n", "\n", " # Esperamos 10seg siempre que no sea la ultima tirada\n", " if i < 3:\n", " time.sleep(10) " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 4 }