{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# PLN (Procesamiento del lenguaje natural) con Python - Recetas\n", " \n", "**Requisitos: Será necesario instalar la librería NLTK, además de descargar el corpus para las stopwords. Por defecto Conda incluye el paquete NLTK así como Google Colab. En el caso de que no estuviera instalado NLTK, ejecutar el siguiente chunk**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Ejecutar este chunk sólo si no está instalado NLTK\n", "# Descomentar la siguiente línea para instalar la libraría:\n", "\n", "#!conda install nltk " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import nltk" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nltk.download_shell() \n", "#d) DOwnload:\n", "#stopwords" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.Descarga de un corpus externo\n", "Descargaremos el set de datos de películas Cornell CS Movie. Incluye valoraciones positivas y negativas de diferentes películas. Los datos pueden descargarse desde:\n", "http://www.cs.cornell.edu/people/pabo/movie-review-data/mix20_rand700_tokens_cleaned.zip" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from nltk.corpus import CategorizedPlaintextCorpusReader" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "El corpus ya está categorizado por múltiples ficheros de texto con revisiones positivas y negativas), por eso usamos **CategorizedPlaintextCorpusReader** en este caso. Más adelante trabajaremos con datos que no lo están. La clase CategorizedPlainCorpusReader, nos permite cargar los datos manteniendo la categorización." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['neg', 'pos']\n", "['neg/cv001_tok-19324.txt', 'neg/cv002_tok-3321.txt', 'neg/cv003_tok-13044.txt', 'neg/cv004_tok-25944.txt', 'neg/cv005_tok-24602.txt', 'neg/cv006_tok-29539.txt', 'neg/cv007_tok-11669.txt', 'neg/cv008_tok-11555.txt', 'neg/cv009_tok-19587.txt', 'neg/cv010_tok-2188.txt', 'neg/cv011_tok-7845.txt', 'neg/cv012_tok-26965.txt', 'neg/cv013_tok-14854.txt', 'neg/cv014_tok-12391.txt', 'neg/cv015_tok-23730.txt', 'neg/cv016_tok-16970.txt', 'neg/cv017_tok-27221.txt', 'neg/cv018_tok-11502.txt', 'neg/cv019_tok-2003.txt', 'neg/cv020_tok-13096.txt', 'neg/cv021_tok-29141.txt', 'neg/cv022_tok-25633.txt', 'neg/cv023_tok-25625.txt', 'neg/cv024_tok-22867.txt', 'neg/cv025_tok-12991.txt', 'neg/cv026_tok-23590.txt', 'neg/cv027_tok-20123.txt', 'neg/cv028_tok-25883.txt', 'neg/cv029_tok-27815.txt', 'neg/cv030_tok-23788.txt', 'neg/cv031_tok-25886.txt', 'neg/cv032_tok-9567.txt', 'neg/cv033_tok-13710.txt', 'neg/cv034_tok-25395.txt', 'neg/cv035_tok-22978.txt', 'neg/cv036_tok-9704.txt', 'neg/cv037_tok-18875.txt', 'neg/cv038_tok-25639.txt', 'neg/cv039_tok-11790.txt', 'neg/cv040_tok-24758.txt', 'neg/cv041_tok-17672.txt', 'neg/cv042_tok-23615.txt', 'neg/cv043_tok-12173.txt', 'neg/cv044_tok-13701.txt', 'neg/cv045_tok-13307.txt', 'neg/cv046_tok-14467.txt', 'neg/cv047_tok-26750.txt', 'neg/cv048_tok-14254.txt', 'neg/cv049_tok-24355.txt']\n" ] } ], "source": [ "reader = CategorizedPlaintextCorpusReader(r'/home/mydoctor/Documents/03.Trabajos/01.C2B/21.Deep Learning - C2B (15h)/scripts/datos/movies/tokens', r'.*\\.txt', cat_pattern=r'(\\w+)/*', encoding='cp1252')\n", "print(reader.categories())\n", "print(reader.fileids()[1:50])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generamos los datos con revisión positiva y negativa" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "posFiles = reader.fileids(categories='pos')\n", "negFiles = reader.fileids(categories='neg')" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['pos/cv001_tok-10180.txt',\n", " 'pos/cv002_tok-12931.txt',\n", " 'pos/cv003_tok-8338.txt',\n", " 'pos/cv004_tok-29856.txt',\n", " 'pos/cv005_tok-26110.txt',\n", " 'pos/cv006_tok-28887.txt',\n", " 'pos/cv007_tok-14417.txt',\n", " 'pos/cv008_tok-15650.txt',\n", " 'pos/cv009_tok-6385.txt']" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "posFiles [1:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Vamos a extraer de manera aleatoria los nombres de 2 ficheroas (uno con revisión negativa y otro positiva)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pos/cv420_tok-22064.txt\n", "neg/cv237_tok-26058.txt\n" ] } ], "source": [ "from random import randint\n", "fileP = posFiles[randint(0,len(posFiles)-1)]\n", "fileN = negFiles[randint(0, len(posFiles) - 1)]\n", "print(fileP)\n", "print(fileN)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Imprimimos cada fichero..." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "starring : matt damon , ben affleck , linda fiorentino , chris rock , alan rickman , janeane garofalo , jason lee , jason mewes , kevin smith director : kevin smith written by : kevin smith in the north america of the late 90 ' s , a time and place wherein the description \" devout catholic \" is an oxymoron , you wouldn ' t think a film like dogma would have had trouble finding a distributor , much less require the painfully self - embarrassed disclaimer which precedes it . \n", "surely even the most pious , guilt - ridden rc couldn ' t possibly take offense to something smith rightly calls a \" comic fantasy , \" something so purposely irreverent it rivals the great schism for sheer , laugh - out - loud ludicrousness . \n", "i mean , come on . \n", "we ' re talking about a movie that features damon and affleck as fallen angels ( complete with really fake - looking wings ) who discover a policy loophole that will either allow them to get back into heaven with a clean slate -- or cause the obliteration of everything , depending on whom you ask . \n", "we ' re talking about dusky - voiced fiorentino as a heroine whose family tree contains more ' begot ' s than the book of genesis , and chris rock as \" rufus , the thirteenth apostle , \" who insists he was written out of the bible because he ' s black . \n", "we ' re talking about inspired bits of ironic casting like alanis morissette as god , or better yet , george carlin as a cardinal who interprets jesus ' \" let the children come unto me , \" as \" get ' em when they ' re young . \n", "\" we ' re talking about stoners jay and silent bob ( jason mewes and smith , who show up as these same characters in every smith movie ) as prophets , fer chrissakes . \n", "by all rights , a film this goofy should have inspired nothing more than simple indifference , and not just in catholics . \n", "yet for the most part , it all clicks . \n", "damon and affleck are ideal as millennia - old buddies ; their scenes together have a seemingly effortless , comfortable ease , and they even get to poke fun at speculation about their offscreen relationship . \n", "certain corporate idolaters are slammed mercilessly , to hilarious effect . \n", "even jay ' s juvenile , expletive - riddled banter , annoying as it can be , often provides a shockingly funny counterpoint when the going threatens to get serious . \n", "the film does lose some steam in the late going , however , and the ending doesn ' t match the promise of the first half . \n", "still , dogma ends up being a thoughtful and heartfelt expression of smith ' s simple message about the difference between ideas and beliefs , faith and religion , and why we ' re all here . \n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "<>:3: SyntaxWarning: \"is\" with a literal. Did you mean \"==\"?\n", "<>:3: SyntaxWarning: \"is\" with a literal. Did you mean \"==\"?\n", "/tmp/ipykernel_358393/3888772881.py:3: SyntaxWarning: \"is\" with a literal. Did you mean \"==\"?\n", " if (w is '.'):\n" ] } ], "source": [ "for w in reader.words(fileP):\n", " print(w + ' ', end='')\n", " if (w is '.'):\n", " print()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "filmcritic . \n", "com presents a review from staff member robert strohmeyer . \n", "you can find the review with full credits at http : // filmcritic . \n", "com / misc / emporium . \n", "nsf / 2a460f93626cd4678625624c007f2b46 / a798b7c8e97eb65c8825695900042829 ? opendocument after 40 continuous years of off - broadway performances , the musical sensation of tom jones and harvey schmidt comes inexplicably to the big screen . \n", "that ' s right , folks . \n", "all the dancing , singing , and mindbogglingly stupid antics of off - broadway ' s longest - running embarrassment can now be experienced at a cinema near you . \n", "set deep in rural america , this is the story of two neighboring fathers who fake a feud in order to trick their children into courtship . \n", "of course , the young man and woman ( played by joe mcintyre and jean louisa kelly , respectively ) are easily duped and everything is going as planned . \n", "that is , until the circus comes to town . \n", "and that ' s when the moronic singing starts . \n", "perhaps the most baffling thing about this incomprehensible production is that it languished for about five years on mgm shelves , nearly ( but not nearly enough ) not making it to the screen , until francis ford coppola offered to have a look at it at much risk to his own sanity . \n", "sadly , even ffc ' s magical touch could not spare us the horror the fantasticks has in store . \n", "the unique combination of cheesy choreography , inept dialog , and insanely ridiculous music , so brilliantly captured under michael ritchie ' s direction , is surely enough to have audiences howling all the way to their cars within the first twenty minutes . \n", "this is the sort of film our grandmothers might have loved . \n", "well , probably not my grandmother but maybe yours , if she was an invalid . \n", "one credit i can pay this flick is that , against all probability , its makers managed to cram more moments of ungodly torture into an 86 - minute musical than i could ever have thought possible . \n", "the saddest thing about the fantasticks is the inclusion in its cast of cabaret ' s memorable emcee , joel grey . \n", "a longtime fan of grey ' s performances , i will be forever scarred by this experience . \n", "to be sure , jean louisa kelly and joe mcintyre will live long enough to regret their roles in this picture . \n", "teller ( of penn & teller ) , on the other hand , might very well consider this the crowning achievement of a career spent frolicking silently about in the shadow of a lowbrow windbag . \n", "the fantasticks might well serve as a worthwhile sacrifice in american filmmaking , demonstrating for all time the god - awful stupidity of silver screen musicals . \n", "use it as you would a roadside accident ; gawk thoroughly as you cruise slowly by , praying for all your life is worth that this sort of atrocity never happens to anyone , ever again . \n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "<>:3: SyntaxWarning: \"is\" with a literal. Did you mean \"==\"?\n", "<>:3: SyntaxWarning: \"is\" with a literal. Did you mean \"==\"?\n", "/tmp/ipykernel_358393/20101311.py:3: SyntaxWarning: \"is\" with a literal. Did you mean \"==\"?\n", " if (w is '.'):\n" ] } ], "source": [ "for w in reader.words(fileN):\n", " print(w + ' ', end='')\n", " if (w is '.'):\n", " print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.Contando todas las palabras 'wh'\n", "\n", "Usaremos en este caso el corpus 'Brown' incluido en el paquete NLTK. Contiene aproximadamente 500 textos categorizados en 15 diferentes géneros y categorías (noticias, humor, ...)." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "import nltk\n", "from nltk.corpus import brown" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Descargamos el set de datos." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[nltk_data] Downloading package brown to /home/mydoctor/nltk_data...\n", "[nltk_data] Package brown is already up-to-date!\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nltk.download('brown')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Las categorías existentes en el set de datos son:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['adventure', 'belles_lettres', 'editorial', 'fiction', 'government', 'hobbies', 'humor', 'learned', 'lore', 'mystery', 'news', 'religion', 'reviews', 'romance', 'science_fiction']\n" ] } ], "source": [ "print(brown.categories())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Seleccionamos 3 géneros, así como las palabras que queremos contar." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "generos = ['fiction', 'humor', 'romance']\n", "palabraswh = ['what', 'which', 'how', 'why', 'when', 'where', 'who']" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Analizando 'fiction\n", "['Thirty-three', 'Scotty', 'did', 'not', 'go', 'back', ...]\n", "\n", "Analizando 'humor\n", "['It', 'was', 'among', 'these', 'that', 'Hinkle', ...]\n", "\n", "Analizando 'romance\n", "['They', 'neither', 'liked', 'nor', 'disliked', 'the', ...]\n" ] } ], "source": [ "for i in range(0,len(generos)):\n", " genero = generos[i]\n", " print()\n", " print(\"Analizando '\"+ genero)\n", " texto_generos = brown.words(categories = genero)\n", " print (texto_generos)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hemos extraído para cada género los textos en brown. Ahora comprobaremos la distribución de frecuencias, para cada categoría seleccionada." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Analizando 'fiction\n", "['Thirty-three', 'Scotty', 'did', 'not', 'go', 'back', ...]\n", "\n", "\n", "Analizando 'humor\n", "['It', 'was', 'among', 'these', 'that', 'Hinkle', ...]\n", "\n", "\n", "Analizando 'romance\n", "['They', 'neither', 'liked', 'nor', 'disliked', 'the', ...]\n", "\n" ] } ], "source": [ "for i in range(0,len(generos)):\n", " genero = generos[i]\n", " print()\n", " print(\"Analizando '\"+ genero)\n", " texto_generos = brown.words(categories = genero)\n", " print (texto_generos)\n", " fdist = nltk.FreqDist(texto_generos)\n", " print (fdist)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Podemos hacer lo mismo, para las palabras wh" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "what: 121 which: 104 how: 60 why: 34 when: 126 where: 54 who: 89 " ] } ], "source": [ "for wh in palabraswh:\n", " print(wh + ':', fdist[wh], end=' ')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Estamos iterando palabraswb y obteniendo el total de ocurrencias de cada caso (what aparece 121 veces en la categoría romance, which 104, ...). Juntando todos los pasos..." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['adventure', 'belles_lettres', 'editorial', 'fiction', 'government', 'hobbies', 'humor', 'learned', 'lore', 'mystery', 'news', 'religion', 'reviews', 'romance', 'science_fiction']\n", "\n", "Analizando 'fiction'\n", "what: 128 which: 123 how: 54 why: 18 when: 133 where: 76 who: 103 \n", "Analizando 'humor'\n", "what: 36 which: 62 how: 18 why: 9 when: 52 where: 15 who: 48 \n", "Analizando 'romance'\n", "what: 121 which: 104 how: 60 why: 34 when: 126 where: 54 who: 89 " ] } ], "source": [ "print(brown.categories())\n", "for i in range(0,len(generos)):\n", " genero = generos[i]\n", " print()\n", " print(\"Analizando '\"+ genero+\"'\")\n", " texto_generos = brown.words(categories = genero)\n", " fdist = nltk.FreqDist(texto_generos)\n", " for wh in palabraswh:\n", " print(wh + ':', fdist[wh], end=' ')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.Analizar la distribución de frecuencias de corpuses en la web y en ficheros de chats\n", "\n", "Aprovecharemos el set de datos webtext de la librería NLTK" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[nltk_data] Downloading package webtext to /home/mydoctor/nltk_data...\n", "[nltk_data] Package webtext is already up-to-date!\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import nltk\n", "from nltk.corpus import webtext\n", "nltk.download('webtext')" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['firefox.txt', 'grail.txt', 'overheard.txt', 'pirates.txt', 'singles.txt', 'wine.txt']\n" ] } ], "source": [ "print(webtext.fileids())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Analizamos el set singles.txt, que va a ser nuestro conjunto de datos \"objetivo\"." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "fileid = 'singles.txt'\n", "wbt_words = webtext.words(fileid)\n", "fdist = nltk.FreqDist(wbt_words)\n", "print (fdist)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Podemos extraer la palabra más común..." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total de apariciones del token \" , \" : 539\n" ] } ], "source": [ "print('Total de apariciones del token \"',fdist.max(),'\" : ', fdist[fdist.max()])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Podemos extraer el total de tokens diferentes en nuestro corpus" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total de tokens distintos en el corpus: 4867\n" ] } ], "source": [ "print('Total de tokens distintos en el corpus: ', fdist.N())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Extraemos los 10 tokens más habituales en el corpus" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Los 10 tokens más comunes del corpus: [(',', 539), ('.', 353), ('/', 110), ('for', 99), ('and', 74), ('to', 74), ('lady', 68), ('-', 66), ('seeks', 60), ('a', 52)]\n" ] } ], "source": [ "print('Los 10 tokens más comunes del corpus:', fdist.most_common(10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pintamos una gráfica con las frecuencias acumuladas de los 20 elementos más habituales." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAE3CAYAAABb6G2FAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAA6zUlEQVR4nO3deXxU9bnH8c+TsIYdwhJA2ZcCApqAgFoV7VVv3arWpbVSr7f0qm21va3L7WJta6t28brXtq51u64VcF9ABVlMkF2WsMomhDUQSEjy3D/OCQwhyUxCZibL9/16zSszvzO/83sYYJ6c33bM3REREalKSrIDEBGRuk/JQkREolKyEBGRqJQsREQkKiULERGJqkmyA4iX9PR07927d43q7tu3j5YtW9a4bdVXfdVX/fpaPycnJ8/dOx9xwN0b5CMzM9NrKjs7u8Z1VV/1VV/163N9INsr+E5VN5SIiESlZCEiIlEpWYiISFRKFiIiEpWShYiIRBW3ZGFmj5nZFjNbFFE20sxmmdk8M8s2s9ERx241s1wzW2ZmZ0WUZ5rZwvDYfWZm8YpZREQqFs8riyeAs8uV3Q3c7u4jgV+FrzGzIcDlwNCwzkNmlhrWeRiYCAwIH+XPKSIicRa3ZOHuHwHbyxcDbcPn7YCN4fMLgOfdvdDdVwO5wGgzywDauvvMcP7vU8CF8YpZRKQ+Ki4p5bN1O3hoWi6/+Wg7W3bvr/U2Er2C+0bgbTP7E0GiGheW9wBmRbxvfVh2IHxevlxEpNEqKXWWbNzNzFV5zFy5jU/X7GBPYfHB4zNXbeOCkbX7VWkex5sfmVlvYIq7Dwtf3wd86O4vm9mlwER3P9PMHgRmuvvT4fseBd4A1gF/cPczw/JTgJvc/bxK2ptI0GVFRkZG5uTJk2sUd0FBAWlpaTWqq/qqr/qqX9v1S91Zt6uYRVuKWLS1iCVbi9h74PDv7ozWqQzr0owB7WDUMW1o27xmHUdZWVk57p51xIGKlnXX1gPoDSyKeL2LQwnKgN3h81uBWyPe9zYwFsgAlkaUXwE8Ekvb2u5D9VVf9etr/dLSUl+2ebc/MWO1f/+pbB95+9ve6+Yphz1Ovut9/9mL8/yVuV/4pp37aq19KtnuI9HdUBuBU4FpwHhgRVg+CXjWzP4CdCcYyJ7j7iVmlm9mY4DZwFXA/QmOWUQkrtydddsLmJG7jSmf7mT5m++Rt6fosPd0b9eCMf06MbZvJ8b260TPDjW/eqmJuCULM3sOOA1IN7P1wG3A94B7zawJsJ+wy8jdF5vZC8ASoBi43t1LwlNdSzCzqiXwZvgQEanX8vYU8snKbcxYkceMlXms37HvsOOd2zRnXERyOLZjGslcORC3ZOHuV1RyKLOS998B3FFBeTYwrBZDExFJuL2FxcxZvZ3puXnMyM1j6eb8w463a9mUcf060bNZAZeddjz9OrdKanIor8Hez0JEJJkOlJQy74udTF+Rxycr8/hs3U6KSw8NSjdvksLoPh05qX86J/VLZ0j3tqSmGDk5OfTv0jqJkVdMyUJEpBaUljrLvsxn8vK9PLBgDrNXb6egqOTg8RSDkce05+T+6Yzr34kTju1Ai6apVZyxblGyEBGpoS+2F/DJyjym525j5sq8iEHpoIupf5fWnNSvEyf1T+fEvp1o17Jp8oI9SkoWIiIx2r63iJkrtzE9N+haWrut4LDj3dq2YHAHOP/EQYzrl063di2SFGntU7IQEanEvqIS5qzZzie5eUzPzWPJpt1ErmNu06IJY/t24uQB6Yzrl06/zq2YO3cumSf0TF7QcaJkISISKi4pZdm2Ij55fwXTc4NB6aKS0oPHmzVJIatXh2BQun86w7q3pUlq47jTg5KFiDRa7s6KLXuYEU5nnb1qO/mFxZTtgWoGw3u2Y1y/dE7un05W7/o1KF2blCxEpFHZuHPfweQwY+U2tuYXHnY8o3UqZwzrwcn90xnTtxPt05olKdK6RclCRBq0XQUHmLkqjxm525iRm8eqvL2HHe/cpvnBGUsn9U9n08olZGYel6Ro6y4lCxFpUApLnOnhFhozcvNYuGHXYYPSrZs3YUzfjkHX0oB0BnRpfdhK6U1JiLk+ULIQkXqttNRZujmfj1ds5eMVecxelceB0i8PHm+aapxw7KFB6eE929G0kQxK1yYlCxGpd7bk72f6ijw+Dh95ew4fdxiS0TacztqJ0X06ktZMX3VHS5+giNR5+w+U8Oma7Xy8Io+Plm89YhO+bm1bcMqAdE4Z2JnWezcwftyoJEXacClZiEid4x7ss/Tx8jw+WrGVOau3U1h8aL1Di6YpjOnbiVMGdOarA9LpHzHukJOjUYd4ULIQkTph+94iPl6xlVfn7GTJW++zpdyU1qHd2x5MDpm9O9C8SeNc75AsShYikhQlpc789Tv5cNlWpi3fyoL1Ow+btdSlTfMgOQwMBqbTWzdPXrCiZCEiibM1v5CPlgfJ4eMVW9lZcODgsWapwf0d+qUV8q3xxzOwa+s6dfOfxk7JQkTipriklM++2Mm0ZVv4cPlWFm3YfdjxYzumcdqgzpw6sDNj+3UirVkTcnJyGNStTZIilsooWYhIrdq2r4QXPv2Cacu38PGKPPL3Fx881rxJCmP7deLUgZ05bVAXendK7n2lJXZKFiJyVEpKnc/W7eCDpVv4YOmWcFrr1oPH+6a34tTw6mFM306NdiO++k7JQkSqbWdBER8u38rUpVuYtvzwsYfmqcYpA4PkcOrALhzbKS2JkUptUbIQkajK1j18sHQLU5duIWftDkojZi716pTG6YO6MH5wF5rtXMuY0VnJC1biQslCRCq0r6iET1bm8cHSLUxbtpUNO/cdPNYkxRjTtyPjB3fh9MFd6JveKmJR3LpkhSxxFLdkYWaPAecCW9x9WET5D4EfAMXA6+5+U1h+K3ANUAL8yN3fDsszgSeAlsAbwA3ukbOxRaS2bNy5j7dyC3hgwRw+WbntsFXT6a2bc/qgzowf3IWTB6TTpkXTJEYqiRbPK4sngAeAp8oKzOx04AJguLsXmlmXsHwIcDkwFOgOvGdmA929BHgYmAjMIkgWZwNvxjFukUZldd5e3ly0ibcXbWb++l2HHRvRsx2nDw66l4Z1b0dKimYuNVZxSxbu/pGZ9S5XfC1wp7sXhu/ZEpZfADwflq82s1xgtJmtAdq6+0wAM3sKuBAlC5Eacw+29H5r0WbeWrSZZV8e2pSvZdNUhndpwiVjB3HaoC50bqNV0xKwePbohMliSlk3lJnNA14juDrYD/zU3T81sweAWe7+dPi+RwkSwhqC5HJmWH4KcLO7n1tJexMJrkLIyMjInDx5co3iLigoIC2t5jM4VF/161p9dyd3xwFmrS9k1ob9bN5TcvBYWlNjVPfmnNijBSO7NaekcF+di1/1E1c/Kysrx92PnKHg7nF7AL2BRRGvFwH3AQaMBlaHzx8Erox436PAxcAo4L2I8lOAybG0nZmZ6TWVnZ1d47qqr/p1pX5xSanPXJnnt722yMf8/j3vdfOUg48TfvOO3/LyfJ+2bIsXHiiJS/uqXz/rA9lewXdqomdDrQdeCQOaY2alQHpYfkzE+3oCG8PynhWUi0gFDpSU8tnmQl56ZQHvLP6SbXuLDh7r1rYFZw/rxtnDujGqd0dSNf4g1ZDoZPEvYDwwzcwGAs2APGAS8KyZ/YVggHsAMMfdS8ws38zGALOBq4D7ExyzSJ1WWup8umY7r83fyBsLN4UL5HYAwfqHs4d14+yh3RjRs70GqKXG4jl19jngNCDdzNYDtwGPAY+Z2SKgCJgQXmUsNrMXgCUEU2qv92AmFASD4k8QTJ19Ew1ui+DuLNm0m0nzNjJp/kY27dp/8FjPtk24aFQfzhnWjcHd2mjvJakV8ZwNdUUlh66s5P13AHdUUJ4NDDuyhkjjs3bbXibN28hr8zeSu2XPwfIe7Vty/sjuXDCyO3s3rCAzc2ASo5SGSCu4Req4rfmFTFmwkdfmbWTeFzsPlnds1YyvH5fBBSO7c8KxHQ52MeVsSFKg0qApWYjUQfn7D/D24i95bd4GZuTmHdyHKa1ZKmcN7cb5I7tzcv90mqamJDdQaTSULETqiJJSZ9qyLfxj5g5yXn2PonCrjaapxviBXbhgZHfO+EoX0prpv60knv7ViSTZ7v0HeDF7PU9+soZ12wsOlp/YpyMXjOzBOcO60aFVsyRGKKJkIZI0uVv28NTMNbyUs56ComDyX88OLTmtZyrXfX003du3THKEIocoWYgkUGmp8+HyrTz+yRo+Wn7obnLj+nXiu+N6c8ZXujLvs7lKFFLnKFmIJED+/gO8lBN0Na3ZFnQ1tWiawjeO78GEcb0Z3K1tkiMUqZqShUgcrdq6h6dmruXF7C/YG3Y19Wjfku+M7cVlWcdoLELqDSULkVpW6sGspic+WcO0ZYe6mk7s05GrT+rNmV/pShNNeZV6RslCpJYUFZfyYs4XPPheHhvzvwSgeZMULhwZdDUN6a6uJqm/lCxEjtKBklJezlnP/R/kHrxPdUa7FnxnbC8uH3UsHdXVJA2AkoVIDRWXlPKveRu57/0VB9dHDOjSmvP6pnLdeePU1SQNipKFSDWVlDqT52/k3vdXsDpvLwB9O7fixjMH8vXjMpj32VwlCmlwlCxEYlRa6ry+cBP/+95yVm4NkkTvTmnccOYAzh/RQzcTkgZNyUIkitJS5+3Fm7nnveUs/zLYFrxnh5b86IwBXHR8D11FSKMQNVmYWStgn7uXhne3Gwy86e4H4h6dSBK5O+8u+ZJ73lvB55t2A8EaiR+M788lmT2146s0KrFcWXwEnGJmHYD3gWzgMuDb8QxMJFncnQ+Wfslf3l3Oog1BkujWtgXXj+/PpVk9ad4kNckRiiReLMnC3L3AzK4B7nf3u83ss3gHJpIM01fkcfsH21mxPVgn0blNc64/rR+Xjz6WFk2VJKTxiilZmNlYgiuJa6pRT6TeWLetgN+9voR3lgRJIr11M/7r1H5cOaaXkoQIsX3p3wDcCrzq7ovNrC8wNb5hiSRGQVExD09bySMfraKouJRWzVK5cGBLfn7pSbrJkEiEWP43dHX388teuPsqM/s4jjGJxJ27M2XBJn7/xuds2rUfgIuO78HN5wxm/YrFShQi5cTyP+JW4MUYykTqhSUbd/PryYuZs3o7AMN6tOX284eS2asjAOuTGZxIHVVpsjCzc4B/B3qY2X0Rh9oCxdFObGaPAecCW9x9WLljPwX+CHR297yw7FaCMZES4Efu/nZYngk8AbQE3gBucHeP9Q8oUmbH3iL+8u5ynpm9llKHjq2acdNZg/hm1jFaUCcSRVVXFhsJpsmeD+RElOcDP47h3E8ADwBPRRaa2THA14B1EWVDgMuBoUB34D0zG+juJcDDwERgFkGyOBt4M4b2RYBge45n56zjz+8sY2fBAVJTjKvH9eLGMwbSLq1pssMTqRcqTRbuPh+Yb2bP1mQBnrt/ZGa9Kzh0D3AT8FpE2QXA8+5eCKw2s1xgtJmtAdq6+0wAM3sKuBAlC4nR7FXb+PXkJQcX1Y3r14lfnz+UgV3bJDkykfrFovXomNlJwK+BXgTJxQB3975RTx4kiyll3VBmdj5whrvfECaCLHfPM7MHgFnu/nT4vkcJEsIa4E53PzMsPwW42d3PraS9iQRXIWRkZGROnjw5WogVKigoIC0trUZ1Vb9u1C+gOU8tyGfGF8HgdXpaCt8d0ZYxPZpjVnWXU12IX/VVP1n1s7Kyctw964gD7l7lA1gKnAN0ATqVPaLVC+v2BhaFz9OA2UC78PUaID18/iBwZUS9R4GLgVHAexHlpwCTY2k7MzPTayo7O7vGdVU/ufX3FRX7Lf+c5oN/8ab3unmKD/z5G37Pu8u8oLA4Ie2rvurX9/pAtlfwnRrLbKhd7l4b3T79gD4EXVsAPYG5ZjaaYALKMRHv7UkwZrI+fF6+XOQIM1du49ZXFrBmW3BviX8/rhv/8+9foWeHmv+WJSKBWJLFVDP7I/AKUFhW6O5zq9OQuy8kuDoBoFw31CTgWTP7C8EA9wBgjruXmFm+mY0huCq5Cri/Ou1Kw7dr3wHufPNznpvzBQA92zbh7kszGdc/PcmRiTQcsSSLE8OfkX1YDoyvqpKZPQecBqSb2XrgNnd/tKL3erAy/AVgCcG03Os9mAkFcC2Hps6+iQa3JcLbizfzy38tYkt+IU1TjR+cPoAT2+1ijBKFSK2Kmizc/fSanNjdr4hyvHe513cAd1TwvmxgWPlyady25O/n15MW88bCzQCccGx77rp4OAO6tiEnJydKbRGprljuZ/Grisrd/Te1H45I1dydF3PWc8frn7Nr3wHSmqVy01mD+M7Y3lpYJxJHsXRD7Y143oJgVfbn8QlHpHLrthXwP68uZHpuHgCnDuzMHd8YpgFskQSIpRvqz5GvzexPwKS4RSRSTkmp8/iM1fz5neXsO1BCh7Sm/Oq8IVw4skfUNRMiUjtqsrVmGhB1QZ5IbVi6eTc3v7SA+et3AXD+iO786rwhpLdunuTIRBqXWMYsFhLMfgJIBToDGq+QuCosLuHBD3J5aNpKikudjHYt+N2FwzjjK12THZpIoxTLlUXk1hrFwJfuHnXXWZGaWppXxM33TSd3yx4ArhxzLDefPZg2LbTpn0iyxDJmsdbMRhBstQHwEbAgrlFJo1RQVMzdby3jyU+240Df9FbcefFwRvfpmOzQRBq9WLqhbgC+R7CCG+AZM/ubu2sltdSamSu3cdPL8/li+z5SDf7rtH78cPwA3f9apI6IpRvqGuBEd98LYGZ3ATPRthtSC/YWFnPXW0t5auZaAL6S0ZZrhjXlkjMGJzkyEYkUS7IwgrvXlSkJy0SOyie5edz08gLW79hHkxTjh+MHcO1p/Vg4/7NkhyYi5cSSLB4HZpvZq+HrCwm2EBepkT2Fxdz55uc8PSu4WeLQ7m354yUjGNK9bZIjE5HKxDLA/RczmwacTHBFcbW761c/qZEZuXnc9NICNuzcR9PUQ1cTTVNTkh2aiFSh0mRhZqMIbk70Zrgd+dyw/HwzS3F37dYmMcvff4A/vLmUZ2cHVxPDegRXE1/J0NWESH1Q1ZXFH4HvVlC+BPgbUbYoFynz8Yqt3PLywoNXEzecMYDvn6qrCZH6pKpk0cnd15QvdPdcM+sUv5Ckocjff4Dfv3HopkTH9WjHn745gkHd2iQ5MhGprqqSRcsqjrWq7UCkYflo+VZueXkBG3ftp1lqCjecOYDvf7UvTXQ1IVIvVZUs3jOzO4BfhDfxBsDMbgc+iHtkUi/t3n+Ah7J38f7qOQCM6NmOP35zBAO76mpCpD6rKln8N/APINfM5oVlI4Bs4D/jHJfUQ5+u2c6Nz89jw859NEtN4cdfG8j3TumjqwmRBqDSZBGu2L7CzPoCQ8Pixe6+KiGRSb1RXFLK/R/kcv8HKyh16N+hKQ9/dywDdDUh0mDEss5iFaAEIRVav6OAG5+fR/baHZjBdaf149ROe5QoRBqYmtz8SASA1xds4pZXFpC/v5iubZtzz6UjGdc/nZwcLcERaWiULKTaCoqKuX3SEv4vO5gSe+ZXunL3JcPp2KpZkiMTkXiJaeTRzE42s6vD553NrE8MdR4zsy1mtiii7I9mttTMFpjZq2bWPuLYrWaWa2bLzOysiPJMM1sYHrvPdNPlpFq0YRfn3j+d/8v+gmZNUvjtBUP5+1WZShQiDVzUZGFmtwE3A7eGRU2Bp2M49xPA2eXK3gWGuftwYHnZOc1sCHA5wUD62cBDZlZ2I4OHgYnAgPBR/pySAO7Oo9NXc9FDn7Bq614Gdm3NpB+cxHfG9kb5W6Thi6Ub6hvA8YR7Q7n7RjOLOnrp7h+ZWe9yZe9EvJwFXBI+vwB43t0LgdVmlguMNrM1QFt3nwlgZk8R7Hr7ZgxxSy3J21PIT1+cz7RlW4HgNqe/+PoQ3ZhIpBGJJVkUububmQOYWW2t3v4P4P/C5z0IkkeZ9WHZgfB5+XJJkA+Xb+W/X5hP3p5C2qc15a6Lh3PW0G7JDktEEswiFmdX/AaznxJ0/3wN+APBl/yzsdxWNbyymOLuw8qV/xzIAi4KE9GDwEx3fzo8/ijwBrAO+IO7nxmWnwLc5O7nVdLeRIIuKzIyMjInT54cLcQKFRQUkJaWVqO6DaV+0+YteWZRPpOXFwAwtHMzbhjdjk5p0a8m6kL8qq/6ql8zWVlZOe6edcQBd4/6IEgUfwT+BHwtljphvd7AonJlEwhuy5oWUXYrcGvE67eBsUAGsDSi/ArgkVjazszM9JrKzs6ucd2GUH/S1Fn+9fs+8l43T/G+t77uD3ywwotLShPWvuqrvuonrz6Q7RV8p0bthjKzHwMvuvu7NU5Vh851NsFg+anuXhBxaBLwrJn9BehOcCUzx91LzCzfzMYAs4Gr0L2/48bdeTFnPb96dxv7S5xjOrbk3suP54RjOyQ7NBFJsljGLNoCb5vZduB54CV3/zJaJTN7DjgNSDez9cBtBFcQzYF3wxk0s9z9v9x9sZm9QHCvjGLgencvu+/3tQQzq1oSDGxrcDsOdu07wM9fXciUBZsAOH9Ed373jWG0bdE0yZGJSF0Qy3YftwO3m9lw4DLgQzNb7+E4QhX1rqiguNJ7d7v7HcAdFZRnA8OOrCG1JWftdn70XLABYFqzVP5jRGv++6KRmhIrIgdVZwX3FmAzsA3oEp9wJJFKSp0Hp+Zy7/srKCl1juvRjvuuOJ7ta5cqUYjIYWIZs7iW4IqiM/AS8D13XxLvwCS+Nu7cx4//bx6zV28H4Ptf7ct//9sgmjVJYfvaJAcnInVOLFcWvYAb3X1enGORBHlr0WZufnkBu/YdoHOb5vzl0hGcMqBzssMSkTqs0mRhZm3dfTdwd/i6Y+Rxd98e59iklu0rKuG3ry/h2dnrADh9UGf++M0RpLdunuTIRKSuq+rK4lngXCAHcCCyE9uBvnGMS2rZ55t286PnPmPFlj00S03hlnMGc/VJ2tdJRGJT1Z3yzg1/Rt1hVuoud+fJT9bw+zeXUlRcSr/Orbj/ihMY0r1tskMTkXoklgHu9939jGhlUvds21PITS8t4P2lWwC4YvSx/OrcIbRspg0ARaR6qhqzaAGkESyq68Chbqi2BKuspQ6bviKPn7wwjy35hbRt0YS7Lh7OOcdlJDssEamnqrqy+D5wI0FiyOFQstgNPBjfsKSmiopL+eeCfF5bPht3GN27I/dcPpIe7VsmOzQRqceqGrO4F7jXzH7oMewwK8m3s6CIq5/4lM/W7SXF4MYzB/KD8f1JTdEgtogcnVi2+7jfzIYBQ4AWEeVPxTMwqZ68PYVc+Y/ZLN2cT3rLFP464USyeneMXlFEJAaxDHDfRrAh4BCCe0ycA0wHlCzqiC279/Otf8wmd8se+qa34pYxaUoUIlKrot6Dm+DWp2cAm939amAEwc6xUgds3LmPSx+ZSe6WPQzs2prnvz+GTi0120lEalcsyWKfu5cCxWbWlmBDQS3IqwO+2F7ApY/MZM22AoZ2b8vzE8fSpU2L6BVFRKoplr2hss2sPfB3gllRe4A58QxKolu1dQ/f+vtsNu/ez8hj2vPk1aNpl6Z7T4hIfMQywH1d+PSvZvYW0NbdF8Q3LKnK8i/z+dbfZ5O3p5BRvTvw2HdH0UY3KRKROKpqUd4JVR1z97nxCUmqsnjjLr7z6By27y1iXL9O/GNCFmnNqnNbEhGR6qvqW+bPVRxzYHwtxyJRzP9iJ995dDa79xdz2qDO/PXKTFo01WC2iMRfVYvyTk9kIFK17DXb+e7jn7KnsJivDenKA986nuZNlChEJDFiWWdxVUXlWpSXODNXbuOaJz+loKiErw/P4H8vG0nT1FgmsomI1I5YOrtHRTxvQbDmYi5alJcQHy7fysSnsiksLuWi43tw9yXDaaJEISIJFstsqB9GvjazdsA/4xaRHPTeki+57pm5FJWUcvmoY/j9N44jRfs8iUgS1GQaTQEwoLYDkcO9sXATP3ruM4pLnQlje3HbeUOVKEQkaaL2Z5jZZDObFD6mAMuA12Ko95iZbTGzRRFlHc3sXTNbEf7sEHHsVjPLNbNlZnZWRHmmmS0Mj91njeA+oB+t3ccPnp1Lcanz/a/25dfnK1GISHLFcmXxp4jnxcBad18fQ70ngAc4fGzjFuB9d7/TzG4JX99sZkOAy4GhBPfPeM/MBrp7CfAwMBGYRbCR4dnAmzG0Xy+98OkX3DdnFw78aHx/fvy1gbpPtogkXdQrC3f/0N0/BD4DPgcKzCzqlqbu/hGwvVzxBcCT4fMngQsjyp9390J3Xw3kAqPNLINgxfhMd3eCxHMhDdRr8zZw8ysLcOBnZw3iJ/82SIlCROoEC76Dq3iD2UTgt8A+oJTgjnnu7lE3EzSz3sAUdx8Wvt7p7u0jju9w9w5m9gAwy92fDssfJbh6WAPc6e5nhuWnADe7+7lVxDoRICMjI3Py5MnRQqxQQUEBaWlpNapb0/o5m/Zz14ydlDhcOqg5lw3vEL1SLbav+qqv+qoPkJWVlePuWUcccPcqH8AKID3a+yqp2xtYFPF6Z7njO8KfDwJXRpQ/ClxMMG33vYjyU4DJsbSdmZnpNZWdnV3jujWpP2tlng/8+Rve6+Yp/oc3Pk94+6qv+qqv+mWAbK/gOzWWCfsrCWZA1YYvw64lwp9bwvL1wDER7+sJbAzLe1ZQ3mAs2rCL/3wyWEdxxehjuPnsQckOSUTkCLEki1uBT8zskXA20n1mdl8N25sETAifT+DQrKpJwOVm1tzM+hBMzZ3j7puAfDMbE86CuooYZmLVFyu37mHCY3PILyzm68dl8LsLj9MYhYjUSbHMhnoE+ABYSDBmERMze47gdqzpZrYeuA24E3jBzK4B1gHfBHD3xWb2ArCEYMbV9R7MhAK4lmBmVUuCcYwGMRNq4859fOcfs9m2t4hTBqRzz2UjSdX0WBGpo2JJFsXu/pPqntjdr6jk0BmVvP8O4I4KyrOBYdVtvy7btqeQKx+dzcZd+8ns1YFHvpNJsybawkNE6q5YvqGmmtlEM8sIF9V1jGXqrFQsf/8BJjw+h1Vb9zK4WxsemzBK96MQkTovlm+pb4U/b40oc3Qf7mrbf6CE/3wym0UbdtOrUxpPXaNboYpI/RDLRoJ9EhFIQ3egpJTrn5nL7NXb6dq2OU9fcyJd2rRIdlgiIjHR/SwSoLTU+dmL83l/6RbapzXln9ecyDEda75oRkQk0XQ/izhzd26fvJh/zdtIq2apPHH1aAZ2bZPssEREqkX3s4ize95bwZMz19IsNYW/X5XFyGPaJzskEZFqq8l8Td3PIkaPTl/Nfe+vIMXg/m8dz7j+6ckOSUSkRmIZs5hMMPsJguQyBHghnkE1BC/lrOe3U5YAcNfFwzlraLckRyQiUnPxvJ9FozV7w37+PGsBAL88dwjfzDomSg0Rkbqt0mRhZv2Brh7cyyKy/BQza+7uK+MeXT30SW4ef5m1k5LS4OZF15ysmcciUv9VNWbxv0B+BeX7wmNSzt7CYv7r6RyKS2HC2F78+GsDkx2SiEitqCpZ9Hb3BeULw72aesctonpsem4eu/cX069DE247b6h2kBWRBqOqZFHV8uKWtR1IQzBtWXB7jtHdW5CiHWRFpAGpKll8ambfK18Ybi+eE7+Q6id3Z+rSrQCckNE8ydGIiNSuqmZD3Qi8ambf5lByyAKaAd+Ic1z1ztLN+WzevZ8ubZrTp712kRWRhqXSbzV3/xIYZ2anc+h+Eq+7+wcJiaye+WBp0AV12qDOmBUnORoRkdoVy3YfU4GpCYilXisbrzh9UBcoalC3CRcRqdF2H1LOroID5KzdQZMU4+QB2tJDRBoeJYta8NGKrZQ6jOrdkTYtdDMjEWl4lCxqwdSyLqjBnZMciYhIfChZHKXSUufDZcGU2dMHdUlyNCIi8aFkcZQWbNjFtr1F9Gjfkv5dWic7HBGRuEhKsjCzH5vZYjNbZGbPmVkLM+toZu+a2YrwZ4eI999qZrlmtszMzkpGzJWZGk6ZHT+4i7b3EJEGK+HJwsx6AD8Cstx9GJAKXA7cArzv7gOA98PXmNmQ8PhQ4GzgITNLTXTclZmm8QoRaQSS1Q3VBGhpZk2ANGAjcAHwZHj8SeDC8PkFwPPuXujuq4FcYHRiw63Y1vxC5q/fRbMmKYztqymzItJwmbtHf1dtN2p2A3AHwXbn77j7t81sp7u3j3jPDnfvYGYPALPc/emw/FHgTXd/qYLzTgQmAmRkZGROnjy5RvEVFBSQlpYW9X1T1+zjgU93cXy3ZvzilI7Vrn+07au+6qu+6td2/aysrBx3zzrigLsn9AF0AD4AOgNNgX8BVwI7y71vR/jzQeDKiPJHgYujtZOZmek1lZ2dHdP7rnsmx3vdPMWfmLG6RvWPtn3VV33VV/3arg9kewXfqcnohjoTWO3uW939APAKMA740swyAMKfW8L3rwci70vak6DbKqmKS0r5aLmmzIpI45CMZLEOGGNmaRZMHzoD+ByYBEwI3zMBeC18Pgm43Myam1kfYAAwJ8ExHyFn7Q7y9xfTt3Mrju1U80s+EZH6IOF7abv7bDN7CZgLFAOfAX8DWgMvhPfLWAd8M3z/YjN7AVgSvv96dy9JdNzlTdVCPBFpRJJy4wV3vw24rVxxIcFVRkXvv4NgQLzOOGyXWRGRBk4ruGtg4859LN2cT6tmqYzq0yF6BRGRek7JogamhV1QJ/VPp3mTOrM+UEQkbpQsaqDsrninD1YXlIg0DkoW1VRYXMKM3DwguIWqiEhjoGRRTXNWb2ffgRK+ktGWjHYtkx2OiEhCKFlU09SlZVNmdVUhIo2HkkU1HbornsYrRKTxULKohtV5e1mdt5e2LZpw/DHtkx2OiEjCKFlUQ9lCvK8O7EyTVH10ItJ46BuvGsq2+BivLigRaWSULGJUUFTMrFXbMAuuLEREGhMlixh9kruNouJShvdsT3rr5skOR0QkoZQsYnRwFpSmzIpII6RkEQN3P7gflMYrRKQxUrKIwYote9iwcx/prZsxrHu7ZIcjIpJwShYxKNs48NSBXUhJsSRHIyKSeEoWMZh6cJdZjVeISOOkZBHF7v0HyF67g9QU45QBShYi0jgpWUQxfUUeJaVOZq8OtGvZNNnhiIgkhZJFFAe7oHSvbRFpxJQsqlBa6ge3+NB4hYg0ZkoWVVi8cTd5ewrJaNeCQV3bJDscEZGkSUqyMLP2ZvaSmS01s8/NbKyZdTSzd81sRfizQ8T7bzWzXDNbZmZnJSrOyHtXmGnKrIg0Xsm6srgXeMvdBwMjgM+BW4D33X0A8H74GjMbAlwODAXOBh4ys9REBHloiw+NV4hI45bwZGFmbYGvAo8CuHuRu+8ELgCeDN/2JHBh+PwC4Hl3L3T31UAuMDrecW7bU8i8L3bSLDWFcf06xbs5EZE6zdw9sQ2ajQT+BiwhuKrIAW4ANrh7+4j37XD3Dmb2ADDL3Z8Oyx8F3nT3lyo490RgIkBGRkbm5MmTaxRjQUEBn2417puzixFdm/Grr3asdv20tLQata36qq/6qp/M+llZWTnunnXEAXdP6APIAoqBE8PX9wK/BXaWe9+O8OeDwJUR5Y8CF0drJzMz02sqOzvbf/jsXO918xT/x8eralT/aKi+6qu+6ierPpDtFXynJmPMYj2w3t1nh69fAk4AvjSzDIDw55aI9x8TUb8nsDGeAZa48+Fy7TIrIlIm4cnC3TcDX5jZoLDoDIIuqUnAhLBsAvBa+HwScLmZNTezPsAAYE48Y1yx7QC79h2gd6c0+qS3imdTIiL1QpMktftD4BkzawasAq4mSFwvmNk1wDrgmwDuvtjMXiBIKMXA9e5eEs/gcjYVAnCaZkGJiABJShbuPo9g7KK8Myp5/x3AHfGMKdJnm4Nkcbq6oEREAK3gPsLmXftZvbOYlk1TObFP9WZBiYg0VEoW5Xy4PBhXP6l/J1o0TcjaPxGROk/Jopyyu+JpvEJE5BAliwhFxaVMX5EHwGmDtMusiEgZJYsIq/P2kmLGMW2b0LNDzVdAiog0NMmaOlsnDerWhrm/+hrvzvg02aGIiNQpurIop2lqCl1bKYeKiERSshARkaiULEREJColCxERiUrJQkREolKyEBGRqJQsREQkKiULERGJKuH34E4UM9sKrK1h9XQg7yiaV33VV33Vr6/1e7n7kfsdVXSv1cb+oJJ70Kq+6qu+6jf0+pU91A0lIiJRKVmIiEhUShYV+5vqq77qq34jrV+hBjvALSIitUdXFiIiEpWShYiIRKVkkURmNtbMLNlxNCRm1i3ZMYg0REoWlTCzDDNrHudmJgA5Zva8mX23Jl90ZpZqZk/HIbb66o1kNWxmHcxstJl9teyRgDZHRf67MbOrzOw1M7vPzDpW81zNzGy4mR1nZs1qGE/riOf9a3IOqRkzG2FmPwgfI2r9/BrgrpiZvQf0A15295/W8Bzd3H1zDO8bDJwDnAW0A6YCbwEz3L0khvpvA+e5e1FN4qwNZtYVGBW+nOPuW6pZPxt4HHjW3XccRRyfufvxNah3N/A7YB/BZz8CuNHdY0rEZvafwA1AT2AeMAaY6e7jqxHDOKA3Ebc7dvenotSZC5zp7tvD5PQ88ENgJPAVd78kxra/DvwVWAkY0Af4vru/GWv84XnmA6uBZ4E/uHu/atTtBowGHPg0lv87Yb3mwMUc+dn9JvbIa/b5R9TNAn4O9ArrW1Ddh0ept5Dgz1uhaPUjznMD8D3glbDoG8Df3P3+WOrH1IaSReXCLqIh7r64hvVfd/evV7NOS+B0guQx1t2zYqjzCHACMAnYW1bu7n+JoW4+Vf9jbRvDOS4F/ghMI/hPcgrwM3d/KVrdiHP0B64GLgPKEsc7Xs1/oGZ2nbs/VJ06Yb157j7SzL4BXAj8GJjq7jH9hhb+px8FzArPMxi43d0vi7H+Pwl+OZkHlP2C4O7+oyj15pfFaGYPAlvd/deRf6YY218KnOvuueHrfsDr7j44Sr00oMjdiyPKrgUeAC539xdjbP8/gV8BHxD8GzoV+I27PxZD3beAXUAOhz473P3PsbQdnqNGn39E/WXAz4CFQGlEDFVuOWRmvcKn14c//xn+/DZQEGvCM7MFBN8Xe8PXrQh+WYkp2cQkHsvC9UjsA7itokc1z/Eb4DqgDdAWuBa4Kca684EuEa87A/Nr+GdJAc4HNgBfALcDHRPwGS4Of/4dOLvsz1WN+p+GP+cBzcueV6P+54S/vFUz7kVAk/D5UuCrkceqcZ6Pyr228mWV1JsFdIt4/Q1gAXAmQbKJtf1lQKeI152AZbF+BrXw91+jzz+i/vSjbH9GLGVV1F8ItIh43QJYeLSfS+Tj4OWW1F/ufjuAmbUJXvqeGpzmLHc/MeL1w2Y2G7g7hropfni30zZqMB5mZsMJri7+HXgZeAY4meC3zZHVPV81TQ5/u94HXGdmnYH91ai/3szaA/8C3jWzHcDGatRfBHQDNlWjDsBzwIdmlkcQ+8dw8EptVzXOs9jM3gBeILjS/CbwqZldBODur1RSr6WH3UVmNpGgK+QMd99qZndWo/31QH7E63yCXxZi8YmZHefuC6vRXnk1/fzL3GZm/wDeBwrLCqv43MprZWYnu/t0ONgl1qoa7T8OzDazV8PXFwKPVqN+VOqGagDMbBjB5WvZgGYecJVXo/vMzD4BHiTo83bgCuB6dx8XQ927Cfr4nwuLLgMWuPvN1Wg/B9hJ8A/8ZXcvjDj2irtfFOu5asrMOgC73b0kvIxv7e5f1uA8pxKMPb3lUcaRzGwywefdhiAhzuHwL5vzY2hvDJBB0G1X1g0xMIx/bowxP17FYXf3/6ik3gfAh8AxwEXAoDBRZABve+x97k8BxwGvEXweFxB8FsvDACrtUjWzJUB/grGSQmIcLyh3jqnU8PMP6z8NDAYWc6gbqtLPrYL6mcBjBP9uIPi/8B+x/v2F5ziB4JersqvCz2KtG9P5lSzqv/CL/ufuPjV8fRrw+1i+6CPO0Ru4FziJ4D/rDIIB3jUx1L0LmE3EP1RgTDWTRV93XxXr+2ubmT0W+R87nNXzmrufEed2T63quLt/GM/2j5aZdSLosiwiGBz/H4JuydMJ/k0+G+N5bqvqeNnVcyV1e1VU7lHGC8qdo8K/h1g/fzNb6O7HxdpeFedpS/C9XJ2rwoRQsmgAIgc5qyqLY/tz3f2EcmULYvnNzsx+UtXxqn6jrE1m9lsg3d2vDa8wXgf+7u5V/cZdm+3fVT65VlQWh3Zvcve7zex+Kpjo4DEO8EacrzvBLxwL3H1ZLYUZrc1jKyp393WJaD+M4e/APe6+pJr16sS//1hozKJhWGVmv+TQTIorCS7JYxb20X+PI6cOVnoZHc56uQ7oG87GKNOG4MokFm3Cn4MIZhNNCl+fR3CFkhDu/kszu8vM/gpkAne6+8uJah/4GlA+MZxTQVlt+zz8mV0bJ3P3jUBMM6AAzOx/3f3GiO648ueLpRvo9bCuEQzs9iEYMB8aQ/vT3f3kCmYFlnVlRZ0NGDoZmGBm1e0KaxPleJ2hK4t6zMz+6e7fCX876c2hbqAPCaZtxrxeIezK+pgjpx9W+oVpZu2ADsAfgFsiDuW7+/Zq/FEws3eAi909P3zdBnjR3c+uznmqq2wAt+wl8EuCfuu3oFoDlDVt/2DCJejGKdOGYDbMlfFsP9nMLNPdc462G6jcOU8gWCPy/aMOMPY2j7orrK5TsqjHwoG9cwh+Gz+d8LeZsuPV+cKuzpz8eAhnIo0oG9gOF1rN9yjz/Guh3RoN7NZi+7WWcI8yjoHATznyyjLmRYV1SUVdo3Fqp62777ZKVstH+zus7W7AeFI3VP32V4LfgPtyeDdCWdLoW41zTTGzf3f3ZG2X8U9gTjj1zwnm68e0evZouPvV8W4jegi+xsyuL3/AzDomMGG8SPDv6R9EXFkmipmdBPyaI1dAR/03XK7fP4WgG3FrHMKsyLPAuQRX5GVdYWVi+T9Yq92A8aQriwbAzB5292uP8hz5BPO6C4EDVL/P9qiF0wdPDl/W+tS/KG0PBB4Gurr7sHDNx/nu/rs4tzvF3c8N+7qP+LKJ5cuyluLIcffMRLRVSftLCVbNl+8G3VZFnbJu2J3APWFxMbCGYPp1ddbJSBRKFnJQeCk9gGCQEEj81E0z61Ku/YTMaDGzDwm2a3jEw72lzGyRuw9LUPv/JBjQ/9jdlyaizbDdsu6THwFbgFc5fJ1BQq5szGy2H74oNJY6Zd2wk4HTyh9PROzh+EilqrHOZSoVd0PVmW5AJQsBKNubp/xGeJ/Ee51BRPvnA38GuhN8aR0LLHX3qDNaaqn9T919lEVsRJjIcRwzG09wVXUKQdfFZwSJ4944t1vhFc3BJ3G+son4sr0USCXYCC8yWVX6ZWtmPyJY49GHw1fLx9yFdbTCL/nyIj+/mL7sw6vqMi0INkYsdvebji7C2qNkIQBHvRFeLbQ/HxgPvOfux5vZ6cAV7j4xQe2/CfyAYAbWCWZ2CXCNu5+TiPbDGFIJ/g5OB/4L2BfvAf6Iti8lWHG+O5yGfQLw2+qsIK5huxV92ZbxWL5sa6Mb9mjF4/Mzsw/dvcpFm4mkAW4ps9/d95sZZtbc3Zea2aAEtn/A3beZWYqZpbj71HBleKJcT3Cj+8FmtoFgnUrCpq2a2fsEY0YzCaYwj/JqbvN+lH7h7i+Y2ckEaz7+TDCGU62uoepy99Nr4RxJTRSho/r8ys2mSgGyCPaqqjOULKTM0W6Ed7R2hltsfAQ8Y2ZbCAYrE8KDrUbOtGBPqJSy9R4JtIBgFs8wgg0Ad5rZTHffl6D2ywaVvw781d1fM7NfJ6htLLgfw+MEGwj+neA381vc/Z1ExXCUjvbzK5tNBYcG6a+ptehqgbqh5AhWjY3warHNVgS7vBrBXv7tgGeqmg1Ty+13BX4PdHf3c8xsCMH9AWp1584Y4mhNsPPuTwm2/o733RrL2p1CsC38mQRJax/BTawStWXMfHcfYWZnEVzl/RJ4PBFrJWrD0X5+FtzH5jqCcSsnuLp8uC7N6FKyEOHgmMXjBJvfjTCzJsBnXgubw8XY/g8IBrczgbUcmhn1QYLaTwPOJrgHwgoLdo09LlG/2Vu4l5iZ3QtMc/dXrYZ3PUyGo/38zOwFYDfBtvwQ7Prcwd2/GZeAa0DJQpKqgj15Dh4iges86sBsqJ8RJIgcj7jrXGMRrqTvQTCzaQTBzKhpyVz7kUiW5M1AY6ExC0kqd68rG6nttWC77eA2ccE9IhK2TbS7/zFRbdVR1xDcT2KVuxeEfxfJXl2fSJ+Z2Rh3nwVgZicS+2acCaFkIRL4CcEeW/3MbAbBrWEvSW5IjYe7l5rZl8CQsAuwUQinrDvQFLjKzNaFr3sB1druPN4azV+KSBT9CFYDH0OwIOpE9P8jYcJp0pcRfEGWzSxyErhNfZKcm+wAYqUxCxEOG2A9mWBW1J+B/6nuFhRSM2a2DBjuEbfTlbolJdkBiNQRR8yTB5olMZ7GZhVBV4zUUbrMFglsMLNHCObJ3xXeT0O/TCVOATAvXMkeuTdUnbmfQ2OnbigRkr/OoLEzswkVlbv7k4mORSqmZCEidYKZNQMGhi+XufuBZMYjh1OyEJGkM7PTgCcJ9kQygllpE9y9oc+GqjeULEQk6cwsB/iWuy8LXw8EnmssK7jrAw3giUhd0LQsUQC4+3I0O6pO0WwoEakLss3sUeCf4etvE2zbLXWEuqFEJOnCqcrXE2zRbQQrtx/SIr26Q8lCRESiUjeUiCSNmb3g7pdGbKh3GHcfnoSwpAK6shCRpDGzDHffZGa9Kjru7msTHZNUTLOhRCRp3H1T+PQ6d18b+SC4zajUEUoWIlIXfK2CsnMSHoVUSmMWIpI0ZnYtwRVEXzNbEHGoDXXsTnGNncYsRCRpzKwd0AH4A3BLxKF8d9+enKikIkoWIlJnmFkXoEXZa3dfl8RwJILGLEQk6czsPDNbAawGPiTYUPDNpAYlh1GyEJG64HfAGGC5u/cBzkBjFnWKkoWI1AUH3H0bkGJmKe4+FRiZ5JgkgmZDiUhdsNPMWhPsCfWMmW0BipMck0TQALeIJJ2ZtQL2E2wi+G2gHfBMeLUhdYCShYiIRKVuKBFJGjPL5/ANBC18bYC7e9ukBCZH0JWFiIhEpdlQIlInmNnJZnZ1+DzdzPokOyY5RFcWIpJ0ZnYbkAUMcveBZtYdeNHdT0pyaBLSlYWI1AXfAM4H9gK4+0aCzQSljlCyEJG6oMiDbg6Hg1NppQ5RshCRpDIzA6aY2SNAezP7HvAe8PfkRiaRNGYhIklnZnOBm4F/I5g2+7a7v5vcqCSS1lmISF0wE9jp7j9LdiBSMV1ZiEjSmdkSYCCwlnCQG8DdhyctKDmMkoWIJJ2Z9aqo3N3XJjoWqZiShYiIRKXZUCIiEpWShYiIRKVkIRKFmf3czBab2QIzm2dmJ8axrWlmlhWv84vUlKbOilTBzMYC5wInuHuhmaUDzZIclkjC6cpCpGoZQJ67FwK4e567bzSzX5nZp2a2yMz+Fq5CLrsyuMfMPjKzz81slJm9YmYrzOx34Xt6m9lSM3syvFp5yczSyjdsZv9mZjPNbK6ZvRjedhQzu9PMloR1/5TAz0IaMSULkaq9AxxjZsvN7CEzOzUsf8DdR7n7MKAlwdVHmSJ3/yrwV+A14HpgGPBdM+sUvmcQ8LdwHcFu4LrIRsMrmF8AZ7r7CUA28BMz60iw6d7QsO7v4vBnFjmCkoVIFdx9D5AJTAS2Av9nZt8FTjez2Wa2EBgPDI2oNin8uRBY7O6bwiuTVcAx4bEv3H1G+Pxp4ORyTY8BhgAzzGweMAHoRZBY9gP/MLOLgILa+rOKVEVjFiJRuHsJMA2YFiaH7wPDgSx3/8LMfg20iKhSGP4sjXhe9rrs/1z5BU7lXxvwrrtfUT4eMxsNnAFcDvyAIFmJxJWuLESqYGaDzGxARNFIYFn4PC8cR7ikBqc+Nhw8B7gCmF7u+CzgJDPrH8aRZmYDw/baufsbwI1hPCJxpysLkaq1Bu43s/ZAMZBL0CW1k6CbaQ3waQ3O+zkwIdyWewXwcORBd98adnc9Z2bNw+JfAPnAa2bWguDq48c1aFuk2rTdh0iCmVlvYEo4OC5SL6gbSkREotKVhYiIRKUrCxERiUrJQkREolKyEBGRqJQsREQkKiULERGJ6v8Bx1q7GzlnypEAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fdist.plot(20,cumulative=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hemos visto que el token más común es ',', seguido de '.' debido a que no hemos preprocesado los datos (eliminación de stopwords, ...)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.Extracción de textos a través de BeautifulSoup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "La librería BeautifulSoup tiene unas potentes funciones para el tratamiento de textos. Vamos usarla para limpiar el texto descargado desde una web." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "import nltk\n", "import urllib.request" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b'\\n\\n\\n\\n \\n \\n\\n PHP: Hypertext Preprocessor\\n\\n \\n \\n \\n \\n\\n \\n \\n \\n\\n\\n\\n\\n\\n\\n\\n\\n\\n \\n\\n \\n\\n \\n\\n \\n\\n \\n\\n\\n\\n\\n\\n\\n\\n
\\n
\\n
\\n
    \\n
    \\n
    \\n\\n\\n\\n
    \\n
    \\n
    \\n
    \\n

    PHP is a popular general-purpose scripting language that is especially suited to web development.

    \\n

    Fast, flexible and pragmatic, PHP powers everything from your blog to the most popular websites in the world.

    \\n
    \\n \\n
    \\n
    \\n\\n\\n
    \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.0.18 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 8.0.18. This is a bug fix release.

    \\n\\n

    All PHP 8.0 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 8.0.18 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.1.5 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 8.1.5. This is a bug fix release.

    \\n\\n

    All PHP 8.1 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 8.1.5 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 7.4.29 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 7.4.29. This is a security release for Windows users.

    \\n\\n

    This is primarily a release for Windows users due to necessarily\\nupgrades to the OpenSSL and zlib dependencies in which security issues\\nhave been found. All PHP 7.4 on Windows users are encouraged to upgrade\\nto this version.

    \\n\\n

    For source downloads of PHP 7.4.29 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.1.4 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 8.1.4. This is a bug fix release.

    \\n\\n

    All PHP 8.1 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 8.1.4 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.0.17 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 8.0.17. This is a bug fix release.

    \\n\\n

    All PHP 8.0 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 8.0.17 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.1.3 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 8.1.3. This is a security release.

    \\n\\n

    All PHP 8.1 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 8.1.3 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.0.16 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 8.0.16. This is a security release.

    \\n\\n

    All PHP 8.0 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 8.0.16 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 7.4.28 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 7.4.28. This is a security release.

    \\n\\n

    All PHP 7.4 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 7.4.28 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.1.2 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 8.1.2. This is a bug fix release.

    \\n\\n

    All PHP 8.1 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 8.1.2 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.0.15 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 8.0.15. This is a bug fix release.

    \\n\\n

    All PHP 8.0 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 8.0.15 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.1.1 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 8.1.1. This is a bug fix release.

    \\n\\n

    All PHP 8.1 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 8.1.1 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.0.14 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 8.0.14. This is a bug fix release.

    \\n\\n

    All PHP 8.0 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 8.0.14 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 7.4.27 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 7.4.27. This is a bug fix release.

    \\n\\n

    All PHP 7.4 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 7.4.27 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.1.0 Released!\\n

    \\n
    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 8.1.0. This release marks the latest minor release of the PHP language.

    \\n\\n

    PHP 8.1 comes with numerous improvements and new features such as:

    \\n\\n\\n

    Take a look at the PHP 8.1 Announcement Addendum for more information.

    \\n\\n

    For source downloads of PHP 8.1.0 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n\\n

    The migration guide is available in the PHP Manual.\\nPlease consult it for the detailed list of new features and backward incompatible changes.

    \\n\\n

    Many thanks to all the contributors and supporters!

    \\n
    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP Foundation Announced\\n

    \\n
    \\n
    \\n
    \\n

    \\n The PHP Foundation has been\\n announced\\n as an entity for funding the work of developing the PHP language.\\n

    \\n

    \\n For more information regarding the structure and purpose of the foundation,\\n please check out the blog post at:\\n jetbrains.com.\\n

    \\n
    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.0.13 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 8.0.13. This is a security release.

    \\n\\n

    All PHP 8.0 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 8.0.13 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 7.3.33 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 7.3.33. This is a security release.

    \\n\\n

    All PHP 7.3 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 7.3.33 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 7.4.26 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 7.4.26. This is a security release.

    \\n\\n

    All PHP 7.4 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 7.4.26 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.1.0 RC 6 available for testing\\n

    \\n
    \\n
    \\n
    \\n

    \\n The PHP team is pleased to announce the release of PHP 8.1.0, RC 6.\\n This is the sixth and final release candidate, continuing the PHP 8.1\\n release cycle, the rough outline of which is specified in the\\n PHP Wiki.\\n

    \\n

    \\n For source downloads of PHP 8.1.0, RC 6 please visit the\\n download page.\\n

    \\n

    \\n Please carefully test this version and report any issues found in the\\n bug reporting system.\\n

    \\n

    Please DO NOT use this version in production, it is an early test version.

    \\n

    \\n For more information on the new features and other changes, you can read the\\n NEWS file\\n or the UPGRADING\\n file for a complete list of upgrading notes. These files can also be\\n found in the release archive.\\n

    \\n

    \\n The next release will be the production-ready, general availability\\n release, planned for 25 November 2021.\\n

    \\n

    \\n The signatures for the release can be found in\\n the manifest\\n or on the QA site.\\n

    \\n

    Thank you for helping us make PHP better.

    \\n
    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.1.0 RC 5 available for testing\\n

    \\n
    \\n
    \\n
    \\n

    \\n The PHP team is pleased to announce the release of PHP 8.1.0, RC 5.\\n This is the fifth release candidate, continuing the PHP 8.1 release cycle,\\n the rough outline of which is specified in the\\n PHP Wiki.\\n

    \\n

    \\n For source downloads of PHP 8.1.0, RC 5 please visit the\\n download page.\\n

    \\n

    \\n Please carefully test this version and report any issues found in the\\n bug reporting system.\\n

    \\n

    Please DO NOT use this version in production, it is an early test version.

    \\n

    \\n For more information on the new features and other changes, you can read the\\n NEWS file\\n or the UPGRADING\\n file for a complete list of upgrading notes. These files can also be\\n found in the release archive.\\n

    \\n

    \\n The next release will be the sixth and last release candidate (RC 6), planned\\n for 11 November 2021.\\n

    \\n

    \\n The signatures for the release can be found in\\n the manifest\\n or on the QA site.\\n

    \\n

    Thank you for helping us make PHP better.

    \\n
    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 7.3.32 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 7.3.32. This is a security release.

    \\n\\n

    All PHP 7.3 FPM users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 7.3.32 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 7.4.25 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 7.4.25. This is a security release.

    \\n\\n

    All PHP 7.4 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 7.4.25 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.0.12 Released!\\n

    \\n
    \\n
    \\n

    The PHP development team announces the immediate availability of PHP 8.0.12. This is a security fix release.

    \\n\\n

    All PHP 8.0 users are encouraged to upgrade to this version.

    \\n\\n

    For source downloads of PHP 8.0.12 please visit our downloads page,\\nWindows source and binaries can be found on windows.php.net/download/.\\nThe list of changes is recorded in the ChangeLog.\\n

    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.1.0 RC 4 available for testing\\n

    \\n
    \\n
    \\n
    \\n

    \\n The PHP team is pleased to announce the release of PHP 8.1.0, RC 4.\\n This is the fourth release candidate, continuing the PHP 8.1 release cycle,\\n the rough outline of which is specified in the\\n PHP Wiki.\\n

    \\n

    \\n For source downloads of PHP 8.1.0, RC 4 please visit the\\n download page.\\n

    \\n

    \\n Please carefully test this version and report any issues found in the\\n bug reporting system.\\n

    \\n

    Please DO NOT use this version in production, it is an early test version.

    \\n

    \\n For more information on the new features and other changes, you can read the\\n NEWS file\\n or the UPGRADING\\n file for a complete list of upgrading notes. These files can also be\\n found in the release archive.\\n

    \\n

    \\n The next release will be the fifth release candidate (RC 5), planned\\n for 28 October 2021.\\n

    \\n

    \\n The signatures for the release can be found in\\n the manifest\\n or on the QA site.\\n

    \\n

    Thank you for helping us make PHP better.

    \\n
    \\n \\n
    \\n
    \\n
    \\n \\n

    \\n PHP 8.1.0 RC 3 available for testing\\n

    \\n
    \\n
    \\n
    \\n

    \\n The PHP team is pleased to announce the release of PHP 8.1.0, RC 3.\\n This is the third release candidate, continuing the PHP 8.1 release cycle,\\n the rough outline of which is specified in the\\n PHP Wiki.\\n

    \\n

    \\n For source downloads of PHP 8.1.0, RC 3 please visit the\\n download page.\\n

    \\n

    \\n Please carefully test this version and report any issues found in the\\n bug reporting system.\\n

    \\n

    Please DO NOT use this version in production, it is an early test version.

    \\n

    \\n For more information on the new features and other changes, you can read the\\n NEWS file\\n or the UPGRADING\\n file for a complete list of upgrading notes. These files can also be\\n found in the release archive.\\n

    \\n

    \\n The next release will be the fourth release candidate (RC 4), planned\\n for 14 October 2021.\\n

    \\n

    \\n The signatures for the release can be found in\\n the manifest\\n or on the QA site.\\n

    \\n

    Thank you for helping us make PHP better.

    \\n
    \\n \\n
    \\n

    Older News Entries

    \\n \\n\\n\\n
    \\n\\n \\n\\n
    \\n \\n \\n\\n\\n\\n\\n\\n\\n\\n\"To\\n\\n\\n\\n'\n" ] } ], "source": [ "response = urllib.request.urlopen('http://php.net/')\n", "html = response.read()\n", "print (html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "El resultado contiene muchas etiquetas HTML que deben limpiarse. En este caso usaremos BeautifulSoup" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PHP: Hypertext PreprocessorDownloadsDocumentationGet InvolvedHelpGetting StartedIntroductionA simple tutorialLanguage ReferenceBasic syntaxTypesVariablesConstantsExpressionsOperatorsControl StructuresFunctionsClasses and ObjectsNamespacesEnumerationsErrorsExceptionsFibersGeneratorsAttributesReferences ExplainedPredefined VariablesPredefined ExceptionsPredefined Interfaces and ClassesContext options and parametersSupported Protocols and WrappersSecurityIntroductionGeneral considerationsInstalled as CGI binaryInstalled as an Apache moduleSession SecurityFilesystem SecurityDatabase SecurityError ReportingUser Submitted DataHiding PHPKeeping CurrentFeaturesHTTP authentication with PHPCookiesSessionsDealing with XFormsHandling file uploadsUsing remote filesConnection handlingPersistent Database ConnectionsCommand line usageGarbage CollectionDTrace Dynamic TracingFunction ReferenceAffecting PHP's BehaviourAudio Formats ManipulationAuthentication ServicesCommand Line Specific ExtensionsCompression and Archive ExtensionsCryptography ExtensionsDatabase ExtensionsDate and Time Related ExtensionsFile System Related ExtensionsHuman Language and Character Encoding SupportImage Processing and GenerationMail Related ExtensionsMathematical ExtensionsNon-Text MIME OutputProcess Control ExtensionsOther Basic ExtensionsOther ServicesSearch Engine ExtensionsServer Specific ExtensionsSession ExtensionsText ProcessingVariable and Type Related ExtensionsWeb ServicesWindows Only ExtensionsXML ManipulationGUI ExtensionsKeyboard Shortcuts?This helpjNext menu itemkPrevious menu itemg pPrevious man pageg nNext man pageGScroll to bottomg gScroll to topg hGoto homepageg sGoto search(current page)/Focus search boxPHP is a popular general-purpose scripting language that is especially suited to web development.Fast, flexible and pragmatic, PHP powers everything from your blog to the most popular websites in the world.Download8.1.5·Release Notes·Upgrading8.0.18·Release Notes·Upgrading7.4.29·Release Notes·Upgrading15 Apr 2022PHP 8.0.18 Released!The PHP development team announces the immediate availability of PHP 8.0.18. This is a bug fix release.All PHP 8.0 users are encouraged to upgrade to this version.For source downloads of PHP 8.0.18 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.14 Apr 2022PHP 8.1.5 Released!The PHP development team announces the immediate availability of PHP 8.1.5. This is a bug fix release.All PHP 8.1 users are encouraged to upgrade to this version.For source downloads of PHP 8.1.5 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.14 Apr 2022PHP 7.4.29 Released!The PHP development team announces the immediate availability of PHP 7.4.29. This is a security release for Windows users.This is primarily a release for Windows users due to necessarily\n", "upgrades to the OpenSSL and zlib dependencies in which security issues\n", "have been found. All PHP 7.4 on Windows users are encouraged to upgrade\n", "to this version.For source downloads of PHP 7.4.29 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.17 Mar 2022PHP 8.1.4 Released!The PHP development team announces the immediate availability of PHP 8.1.4. This is a bug fix release.All PHP 8.1 users are encouraged to upgrade to this version.For source downloads of PHP 8.1.4 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.17 Mar 2022PHP 8.0.17 Released!The PHP development team announces the immediate availability of PHP 8.0.17. This is a bug fix release.All PHP 8.0 users are encouraged to upgrade to this version.For source downloads of PHP 8.0.17 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.17 Feb 2022PHP 8.1.3 Released!The PHP development team announces the immediate availability of PHP 8.1.3. This is a security release.All PHP 8.1 users are encouraged to upgrade to this version.For source downloads of PHP 8.1.3 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.17 Feb 2022PHP 8.0.16 Released!The PHP development team announces the immediate availability of PHP 8.0.16. This is a security release.All PHP 8.0 users are encouraged to upgrade to this version.For source downloads of PHP 8.0.16 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.17 Feb 2022PHP 7.4.28 Released!The PHP development team announces the immediate availability of PHP 7.4.28. This is a security release.All PHP 7.4 users are encouraged to upgrade to this version.For source downloads of PHP 7.4.28 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.21 Jan 2022PHP 8.1.2 Released!The PHP development team announces the immediate availability of PHP 8.1.2. This is a bug fix release.All PHP 8.1 users are encouraged to upgrade to this version.For source downloads of PHP 8.1.2 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.20 Jan 2022PHP 8.0.15 Released!The PHP development team announces the immediate availability of PHP 8.0.15. This is a bug fix release.All PHP 8.0 users are encouraged to upgrade to this version.For source downloads of PHP 8.0.15 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.17 Dec 2021PHP 8.1.1 Released!The PHP development team announces the immediate availability of PHP 8.1.1. This is a bug fix release.All PHP 8.1 users are encouraged to upgrade to this version.For source downloads of PHP 8.1.1 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.16 Dec 2021PHP 8.0.14 Released!The PHP development team announces the immediate availability of PHP 8.0.14. This is a bug fix release.All PHP 8.0 users are encouraged to upgrade to this version.For source downloads of PHP 8.0.14 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.16 Dec 2021PHP 7.4.27 Released!The PHP development team announces the immediate availability of PHP 7.4.27. This is a bug fix release.All PHP 7.4 users are encouraged to upgrade to this version.For source downloads of PHP 7.4.27 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.25 Nov 2021PHP 8.1.0 Released!The PHP development team announces the immediate availability of PHP 8.1.0. This release marks the latest minor release of the PHP language.PHP 8.1 comes with numerous improvements and new features such as:EnumerationsReadonly propertiesFibersPure Intersection Typesneverreturn typeFirst-class Callable Syntax\"final\" modifier for class constantsNewfsyncandfdatasyncfunctionsNewarray_is_listfunctionExplicitOctal numeral notationAnd much much more...Take a look at thePHP 8.1 Announcement Addendumfor more information.For source downloads of PHP 8.1.0 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.Themigration guideis available in the PHP Manual.\n", "Please consult it for the detailed list of new features and backward incompatible changes.Many thanks to all the contributors and supporters!22 Nov 2021PHP Foundation AnnouncedThe PHP Foundationhas beenannouncedas an entity for funding the work of developing the PHP language.For more information regarding the structure and purpose of the foundation,\n", " please check out the blog post at:jetbrains.com.19 Nov 2021PHP 8.0.13 Released!The PHP development team announces the immediate availability of PHP 8.0.13. This is a security release.All PHP 8.0 users are encouraged to upgrade to this version.For source downloads of PHP 8.0.13 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.18 Nov 2021PHP 7.3.33 Released!The PHP development team announces the immediate availability of PHP 7.3.33. This is a security release.All PHP 7.3 users are encouraged to upgrade to this version.For source downloads of PHP 7.3.33 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.18 Nov 2021PHP 7.4.26 Released!The PHP development team announces the immediate availability of PHP 7.4.26. This is a security release.All PHP 7.4 users are encouraged to upgrade to this version.For source downloads of PHP 7.4.26 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.11 Nov 2021PHP 8.1.0 RC 6 available for testingThe PHP team is pleased to announce the release of PHP 8.1.0, RC 6.\n", " This is the sixth and final release candidate, continuing the PHP 8.1\n", " release cycle, the rough outline of which is specified in thePHP Wiki.For source downloads of PHP 8.1.0, RC 6 please visit thedownload page.Please carefully test this version and report any issues found in thebug reporting system.Please DO NOT use this version in production, it is an early test version.For more information on the new features and other changes, you can read theNEWSfile\n", " or theUPGRADINGfile for a complete list of upgrading notes. These files can also be\n", " found in the release archive.The next release will be the production-ready, general availability\n", " release, planned for 25 November 2021.The signatures for the release can be found inthe manifestor onthe QA site.Thank you for helping us make PHP better.28 Oct 2021PHP 8.1.0 RC 5 available for testingThe PHP team is pleased to announce the release of PHP 8.1.0, RC 5.\n", " This is the fifth release candidate, continuing the PHP 8.1 release cycle,\n", " the rough outline of which is specified in thePHP Wiki.For source downloads of PHP 8.1.0, RC 5 please visit thedownload page.Please carefully test this version and report any issues found in thebug reporting system.Please DO NOT use this version in production, it is an early test version.For more information on the new features and other changes, you can read theNEWSfile\n", " or theUPGRADINGfile for a complete list of upgrading notes. These files can also be\n", " found in the release archive.The next release will be the sixth and last release candidate (RC 6), planned\n", " for 11 November 2021.The signatures for the release can be found inthe manifestor onthe QA site.Thank you for helping us make PHP better.28 Oct 2021PHP 7.3.32 Released!The PHP development team announces the immediate availability of PHP 7.3.32. This is a security release.All PHP 7.3 FPM users are encouraged to upgrade to this version.For source downloads of PHP 7.3.32 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.22 Oct 2021PHP 7.4.25 Released!The PHP development team announces the immediate availability of PHP 7.4.25. This is a security release.All PHP 7.4 users are encouraged to upgrade to this version.For source downloads of PHP 7.4.25 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.21 Oct 2021PHP 8.0.12 Released!The PHP development team announces the immediate availability of PHP 8.0.12. This is a security fix release.All PHP 8.0 users are encouraged to upgrade to this version.For source downloads of PHP 8.0.12 please visit ourdownloads page,\n", "Windows source and binaries can be found onwindows.php.net/download/.\n", "The list of changes is recorded in theChangeLog.14 Oct 2021PHP 8.1.0 RC 4 available for testingThe PHP team is pleased to announce the release of PHP 8.1.0, RC 4.\n", " This is the fourth release candidate, continuing the PHP 8.1 release cycle,\n", " the rough outline of which is specified in thePHP Wiki.For source downloads of PHP 8.1.0, RC 4 please visit thedownload page.Please carefully test this version and report any issues found in thebug reporting system.Please DO NOT use this version in production, it is an early test version.For more information on the new features and other changes, you can read theNEWSfile\n", " or theUPGRADINGfile for a complete list of upgrading notes. These files can also be\n", " found in the release archive.The next release will be the fifth release candidate (RC 5), planned\n", " for 28 October 2021.The signatures for the release can be found inthe manifestor onthe QA site.Thank you for helping us make PHP better.30 Sep 2021PHP 8.1.0 RC 3 available for testingThe PHP team is pleased to announce the release of PHP 8.1.0, RC 3.\n", " This is the third release candidate, continuing the PHP 8.1 release cycle,\n", " the rough outline of which is specified in thePHP Wiki.For source downloads of PHP 8.1.0, RC 3 please visit thedownload page.Please carefully test this version and report any issues found in thebug reporting system.Please DO NOT use this version in production, it is an early test version.For more information on the new features and other changes, you can read theNEWSfile\n", " or theUPGRADINGfile for a complete list of upgrading notes. These files can also be\n", " found in the release archive.The next release will be the fourth release candidate (RC 4), planned\n", " for 14 October 2021.The signatures for the release can be found inthe manifestor onthe QA site.Thank you for helping us make PHP better.Older News EntriesUpcoming conferencesDutch PHP Conference 2022 - ScheduleInternational PHP Conference Berlin 2022PHP Russia 2022User Group EventsSpecial ThanksSocial media@official_phpCopyright © 2001-2022 The PHP GroupMy PHP.netContactOther PHP.net sitesPrivacy policyView Source\n" ] } ], "source": [ "from bs4 import BeautifulSoup\n", "import urllib.request\n", "\n", "response = urllib.request.urlopen('http://php.net/')\n", "html = response.read()\n", "soup = BeautifulSoup(html,\"html5lib\")\n", "text = soup.get_text(strip=True)\n", "\n", "print (text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hemos eliminado a través de BeautifulSoup aquellas referencias html del texto. Seguimos. Extraemos los tokens del texto." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Hypertext',\n", " 'PreprocessorDownloadsDocumentationGet',\n", " 'InvolvedHelpGetting',\n", " 'StartedIntroductionA',\n", " 'simple',\n", " 'tutorialLanguage',\n", " 'ReferenceBasic',\n", " 'syntaxTypesVariablesConstantsExpressionsOperatorsControl',\n", " 'StructuresFunctionsClasses',\n", " 'and',\n", " 'ObjectsNamespacesEnumerationsErrorsExceptionsFibersGeneratorsAttributesReferences',\n", " 'ExplainedPredefined',\n", " 'VariablesPredefined',\n", " 'ExceptionsPredefined',\n", " 'Interfaces',\n", " 'and',\n", " 'ClassesContext',\n", " 'options',\n", " 'and']" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tokens = [t for t in text.split()]\n", "tokens [1:20]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Contamos la frecuencia de palabras..." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PHP::1\n", "Hypertext:1\n", "PreprocessorDownloadsDocumentationGet:1\n", "InvolvedHelpGetting:1\n", "StartedIntroductionA:1\n", "simple:1\n", "tutorialLanguage:1\n", "ReferenceBasic:1\n", "syntaxTypesVariablesConstantsExpressionsOperatorsControl:1\n", "StructuresFunctionsClasses:1\n", "and:45\n", "ObjectsNamespacesEnumerationsErrorsExceptionsFibersGeneratorsAttributesReferences:1\n", "ExplainedPredefined:1\n", "VariablesPredefined:1\n", "ExceptionsPredefined:1\n", "Interfaces:1\n", "ClassesContext:1\n", "options:1\n", "parametersSupported:1\n", "Protocols:1\n", "WrappersSecurityIntroductionGeneral:1\n", "considerationsInstalled:1\n", "as:2\n", "CGI:1\n", "binaryInstalled:1\n", "an:6\n", "Apache:1\n", "moduleSession:1\n", "SecurityFilesystem:1\n", "SecurityDatabase:1\n", "SecurityError:1\n", "ReportingUser:1\n", "Submitted:1\n", "DataHiding:1\n", "PHPKeeping:1\n", "CurrentFeaturesHTTP:1\n", "authentication:1\n", "with:3\n", "PHPCookiesSessionsDealing:1\n", "XFormsHandling:1\n", "file:1\n", "uploadsUsing:1\n", "remote:1\n", "filesConnection:1\n", "handlingPersistent:1\n", "Database:1\n", "ConnectionsCommand:1\n", "line:1\n", "usageGarbage:1\n", "CollectionDTrace:1\n", "Dynamic:1\n", "TracingFunction:1\n", "ReferenceAffecting:1\n", "PHP's:1\n", "BehaviourAudio:1\n", "Formats:1\n", "ManipulationAuthentication:1\n", "ServicesCommand:1\n", "Line:1\n", "Specific:2\n", "ExtensionsCompression:1\n", "Archive:1\n", "ExtensionsCryptography:1\n", "ExtensionsDatabase:1\n", "ExtensionsDate:1\n", "Time:1\n", "Related:4\n", "ExtensionsFile:1\n", "System:1\n", "ExtensionsHuman:1\n", "Language:1\n", "Character:1\n", "Encoding:1\n", "SupportImage:1\n", "Processing:1\n", "GenerationMail:1\n", "ExtensionsMathematical:1\n", "ExtensionsNon-Text:1\n", "MIME:1\n", "OutputProcess:1\n", "Control:1\n", "ExtensionsOther:2\n", "Basic:1\n", "ServicesSearch:1\n", "Engine:1\n", "ExtensionsServer:1\n", "ExtensionsSession:1\n", "ExtensionsText:1\n", "ProcessingVariable:1\n", "Type:1\n", "ExtensionsWeb:1\n", "ServicesWindows:1\n", "Only:1\n", "ExtensionsXML:1\n", "ManipulationGUI:1\n", "ExtensionsKeyboard:1\n", "Shortcuts?This:1\n", "helpjNext:1\n", "menu:2\n", "itemkPrevious:1\n", "itemg:1\n", "pPrevious:1\n", "man:2\n", "pageg:1\n", "nNext:1\n", "pageGScroll:1\n", "to:49\n", "bottomg:1\n", "gScroll:1\n", "topg:1\n", "hGoto:1\n", "homepageg:1\n", "sGoto:1\n", "search(current:1\n", "page)/Focus:1\n", "search:1\n", "boxPHP:1\n", "is:58\n", "a:26\n", "popular:2\n", "general-purpose:1\n", "scripting:1\n", "language:1\n", "that:1\n", "especially:1\n", "suited:1\n", "web:1\n", "development.Fast,:1\n", "flexible:1\n", "pragmatic,:1\n", "PHP:107\n", "powers:1\n", "everything:1\n", "from:1\n", "your:1\n", "blog:2\n", "the:65\n", "most:1\n", "websites:1\n", "in:39\n", "world.Download8.1.5·Release:1\n", "Notes·Upgrading8.0.18·Release:1\n", "Notes·Upgrading7.4.29·Release:1\n", "Notes·Upgrading15:1\n", "Apr:3\n", "2022PHP:11\n", "8.0.18:2\n", "Released!The:20\n", "development:20\n", "team:24\n", "announces:20\n", "immediate:20\n", "availability:21\n", "of:80\n", "8.0.18.:1\n", "This:24\n", "bug:9\n", "fix:10\n", "release.All:18\n", "8.0:7\n", "users:20\n", "are:19\n", "encouraged:19\n", "upgrade:19\n", "this:27\n", "version.For:23\n", "source:44\n", "downloads:24\n", "please:25\n", "visit:24\n", "ourdownloads:20\n", "page,:20\n", "Windows:23\n", "binaries:20\n", "can:32\n", "be:32\n", "found:32\n", "onwindows.php.net/download/.:20\n", "The:21\n", "list:25\n", "changes:20\n", "recorded:20\n", "theChangeLog.14:3\n", "8.1.5:2\n", "8.1.5.:1\n", "8.1:11\n", "7.4.29:2\n", "7.4.29.:1\n", "security:11\n", "release:31\n", "for:25\n", "users.This:1\n", "primarily:1\n", "due:1\n", "necessarily:1\n", "upgrades:1\n", "OpenSSL:1\n", "zlib:1\n", "dependencies:1\n", "which:5\n", "issues:5\n", "have:1\n", "been:1\n", "found.:1\n", "All:1\n", "7.4:5\n", "on:5\n", "theChangeLog.17:6\n", "Mar:2\n", "8.1.4:2\n", "8.1.4.:1\n", "8.0.17:2\n", "8.0.17.:1\n", "Feb:3\n", "8.1.3:2\n", "8.1.3.:1\n", "8.0.16:2\n", "8.0.16.:1\n", "7.4.28:2\n", "7.4.28.:1\n", "theChangeLog.21:2\n", "Jan:2\n", "8.1.2:2\n", "8.1.2.:1\n", "theChangeLog.20:1\n", "8.0.15:2\n", "8.0.15.:1\n", "Dec:3\n", "2021PHP:15\n", "8.1.1:2\n", "8.1.1.:1\n", "theChangeLog.16:2\n", "8.0.14:2\n", "8.0.14.:1\n", "7.4.27:2\n", "7.4.27.:1\n", "theChangeLog.25:1\n", "Nov:6\n", "8.1.0:6\n", "8.1.0.:1\n", "marks:1\n", "latest:1\n", "minor:1\n", "language.PHP:1\n", "comes:1\n", "numerous:1\n", "improvements:1\n", "new:6\n", "features:6\n", "such:1\n", "as:EnumerationsReadonly:1\n", "propertiesFibersPure:1\n", "Intersection:1\n", "Typesneverreturn:1\n", "typeFirst-class:1\n", "Callable:1\n", "Syntax\"final\":1\n", "modifier:1\n", "class:1\n", "constantsNewfsyncandfdatasyncfunctionsNewarray_is_listfunctionExplicitOctal:1\n", "numeral:1\n", "notationAnd:1\n", "much:2\n", "more...Take:1\n", "look:1\n", "at:1\n", "thePHP:5\n", "Announcement:1\n", "Addendumfor:1\n", "more:6\n", "information.For:1\n", "theChangeLog.Themigration:1\n", "guideis:1\n", "available:5\n", "Manual.:1\n", "Please:1\n", "consult:1\n", "it:5\n", "detailed:1\n", "backward:1\n", "incompatible:1\n", "changes.Many:1\n", "thanks:1\n", "all:1\n", "contributors:1\n", "supporters!22:1\n", "Foundation:1\n", "AnnouncedThe:1\n", "Foundationhas:1\n", "beenannouncedas:1\n", "entity:1\n", "funding:1\n", "work:1\n", "developing:1\n", "language.For:1\n", "information:5\n", "regarding:1\n", "structure:1\n", "purpose:1\n", "foundation,:1\n", "check:1\n", "out:1\n", "post:1\n", "at:jetbrains.com.19:1\n", "8.0.13:2\n", "8.0.13.:1\n", "theChangeLog.18:2\n", "7.3.33:2\n", "7.3.33.:1\n", "7.3:2\n", "7.4.26:2\n", "7.4.26.:1\n", "theChangeLog.11:1\n", "RC:12\n", "6:2\n", "testingThe:4\n", "pleased:4\n", "announce:4\n", "8.1.0,:8\n", "6.:1\n", "sixth:2\n", "final:1\n", "candidate,:4\n", "continuing:4\n", "cycle,:4\n", "rough:4\n", "outline:4\n", "specified:4\n", "Wiki.For:4\n", "thedownload:4\n", "page.Please:4\n", "carefully:4\n", "test:8\n", "version:8\n", "report:4\n", "any:4\n", "thebug:4\n", "reporting:4\n", "system.Please:4\n", "DO:4\n", "NOT:4\n", "use:4\n", "production,:4\n", "early:4\n", "other:4\n", "changes,:4\n", "you:8\n", "read:4\n", "theNEWSfile:4\n", "or:4\n", "theUPGRADINGfile:4\n", "complete:4\n", "upgrading:4\n", "notes.:4\n", "These:4\n", "files:4\n", "also:4\n", "archive.The:4\n", "next:4\n", "will:4\n", "production-ready,:1\n", "general:1\n", "release,:1\n", "planned:4\n", "25:1\n", "November:2\n", "2021.The:4\n", "signatures:4\n", "inthe:4\n", "manifestor:4\n", "onthe:4\n", "QA:4\n", "site.Thank:4\n", "helping:4\n", "us:4\n", "make:4\n", "better.28:2\n", "Oct:5\n", "5:2\n", "5.:1\n", "fifth:2\n", "last:1\n", "candidate:3\n", "(RC:3\n", "6),:1\n", "11:1\n", "7.3.32:2\n", "7.3.32.:1\n", "FPM:1\n", "theChangeLog.22:1\n", "7.4.25:2\n", "7.4.25.:1\n", "8.0.12:2\n", "8.0.12.:1\n", "4:2\n", "4.:1\n", "fourth:2\n", "5),:1\n", "28:1\n", "October:2\n", "better.30:1\n", "Sep:1\n", "3:2\n", "3.:1\n", "third:1\n", "4),:1\n", "14:1\n", "better.Older:1\n", "News:1\n", "EntriesUpcoming:1\n", "conferencesDutch:1\n", "Conference:2\n", "2022:1\n", "-:1\n", "ScheduleInternational:1\n", "Berlin:1\n", "Russia:1\n", "2022User:1\n", "Group:1\n", "EventsSpecial:1\n", "ThanksSocial:1\n", "media@official_phpCopyright:1\n", "©:1\n", "2001-2022:1\n", "GroupMy:1\n", "PHP.netContactOther:1\n", "PHP.net:1\n", "sitesPrivacy:1\n", "policyView:1\n", "Source:1\n" ] } ], "source": [ "freq = nltk.FreqDist(tokens)\n", "\n", "for key,val in freq.items():\n", " print (str(key) + ':' + str(val))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "¿Cuáles son los token más frecuentes?" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEyCAYAAAAV7MyFAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAzUklEQVR4nO3deXxcdbnH8c+TrWnSdE3aBrpDadmhKRQpOyIqOyiioOWi4oIXFBf0igpevdcNr4KAIotsIouyFFTAIkuBUpLuC6Wl+0LpvqRLmvS5f5yTdJqmyZnMZE6S+b5fr3lN5sz8znmyzTO/3dwdERERgJy4AxARkfZDSUFERBooKYiISAMlBRERaaCkICIiDZQURESkQV7cAaSitLTUhwwZ0ury27dvp2vXriqv8iqv8llVvqqqaq27lzX5pLt32FtFRYWnorKyUuVVXuVVPuvKA5W+n/dVNR+JiEgDJQUREWmgpCAiIg2UFEREpIGSgoiINFBSEBGRBlmZFKp31vLKu2tYuaU27lBERNqVrEwK//fiu4y7dzKvLNkedygiIu1KViaFE4b1AWD2mpqYIxERaV+yMikcN7Q3ZjB/3S6219TFHY6ISLuRlUmhR9d8Dj+gO7UOU5ZuiDscEZF2IyuTAsCYoUET0lsL18UciYhI+5G1SaG+X2HSwvUxRyIi0n5kbVI4fkhvDJi2bKP6FUREQlmbFHoU5TOkZx41dbuZqn4FEREgi5MCwOFlBQBMUr+CiAigpADApEXqVxARgSxPCoeWFWAG05ZuZMcu9SuIiGR1UigpyOHQ/t2pqdut+QoiImR5UgANTRURSZT1SWHMsN6AJrGJiEAbJgUzu9fMPjCzWQnHepvZi2Y2P7zvlfDc98xsgZnNM7Oz2yquxsaE6yBNXaZ+BRGRtqwp/An4aKNj3wUmuPtwYEL4GDM7DLgMODwsc4eZ5bZhbA16FhUwsn93amp3M3XpxkxcUkSk3WqzpODurwKNG+ovAO4Pv74fuDDh+F/cfae7LwIWAMe3VWyNnRA2IWm+gohkO3P3tju52RDgWXc/Iny80d17Jjy/wd17mdnvgEnu/lB4/B7gH+7+RBPnvBq4GqC8vLxi/PjxrY5v27ZtFBUV8daKHfzijY0cXlbAj0/rnXT5VK+v8iqv8iqfyfKjR4+ucvfRTT7p7m12A4YAsxIeb2z0/Ibw/nbgioTj9wCXtHT+iooKT0VlZaW7u6/futMH3/CsD//+3317TW3S5VO9vsqrvMqrfCbLA5W+n/fVTI8+Wm1m5QDh/Qfh8eXAwITXDQBWZiqoXsUFjOxfQk3tbqYt25ipy4qItDuZTgrPAOPCr8cBTyccv8zMupjZUGA4MDmTge2Zr6B+BRHJXm05JPUR4E1ghJktN7PPAz8DzjKz+cBZ4WPcfTbwGDAH+CdwjbtndHyokoKICOS11Ynd/dP7eerM/bz+p8BP2yqelhw/NOhgnhqug1SYn5ERsSIi7UrWz2iu1zvsV9hZu5vp6lcQkSylpJBA6yCJSLZTUkigSWwiku2UFBIcPzSoKUxZuoGdtVoHSUSyj5JCgt7FBYzoV9+vsCnucEREMk5JoRE1IYlINlNSaETzFUQkmykpNFI/X0H9CiKSjZQUGunTrQuH9OvGjl27mbFc/Qoikl2UFJrQ0IT0npqQRCS7KCk0oSEpLFJSEJHsoqTQhPp+haolG6ip3R1zNCIimaOk0ITSvfoVNsYdjohIxigp7MeYoRqaKiLZR0lhP7Q4nohkIyWF/RgTzmyuXLJe/QoikjWUFPajtFsXhvcN+hVmrtgYdzgiIhmhpNCMMQ3rIKkJSUSyg5JCM7QOkohkGyWFZtSPQKpcvIFddepXEJHOT0mhGWUlXTi4bze276rTOkgikhWUFFqg/RVEJJsoKbRAk9hEJJsoKbSgYb6C+hVEJAsoKbSgb0khB5UVq19BRLKCkkIE9UNT39JS2iLSySkpRDBG6yCJSJZQUojghKH1/Qrr1a8gIp2akkIEfbsXMqysmG01dcxcoX4FEem8lBQiauhXUBOSiHRiSgoRaR0kEckGSgoRqV9BRLKBkkJEfbsXMqy0mOqaOmapX0FEOiklhSRoaKqIdHZKCkmoXxxPk9hEpLNSUkhCfWfz24vWU6t+BRHphJQUktCveyFD6/sVVm6OOxwRkbRTUkiS9lcQkc4slqRgZt8ws9lmNsvMHjGzQjPrbWYvmtn88L5XHLG1ZM8kNiUFEel8Mp4UzOxA4FpgtLsfAeQClwHfBSa4+3BgQvi43anfdOftxRuo2+0xRyMikl5xNR/lAV3NLA8oAlYCFwD3h8/fD1wYT2jN69+jkCF9iti6s5ZFG2vjDkdEJK0ynhTcfQXwK2ApsArY5O4vAP3cfVX4mlVA30zHFlV9E9LMD3bGHImISHqZe2abQMK+gr8CnwI2Ao8DTwC/c/eeCa/b4O779CuY2dXA1QDl5eUV48ePb3Us27Zto6ioKOlyk5bv4JdvbqRfUQ63fbyMXLOMXl/lVV7lVT6V8qNHj65y99FNPunuGb0BnwTuSXj8OeAOYB5QHh4rB+a1dK6KigpPRWVlZavK7aqt8xP/d4IPvuFZf37WqoxfX+VVXuVVPpXyQKXv5301jj6FpcAJZlZkZgacCcwFngHGha8ZBzwdQ2yR5OXmcNVJQwG4+7VFMUcjIpI+cfQpvEXQXDQFmBnGcBfwM+AsM5sPnBU+brcuHT2Aojxj8uL1TF+2Me5wRETSIpbRR+7+I3cf6e5HuPtn3X2nu69z9zPdfXh4365XnSspzOfDw7oC8MfXFsYcjYhIemhGcwrOGV5MXo7xj1nvs3zDtrjDERFJmZJCCkqLcjnnqHLqdjt/en1x3OGIiKRMSSFFXzhpGAB/eXsZm3fsijkaEZHUKCmk6MgBPRgztDdbd9by6ORlcYcjIpISJYU0+OLJQW3hvtcXaf9mEenQlBTS4IyRfRlWVszKTTv4x6z34w5HRKTVlBTSICfH+HzDZLaF9TO1RUQ6HCWFNLlk1AB6FeUzY/kmJi9q11MsRET2S0khTQrzc/nsCYMB+KOWvhCRDkpJIY0++6EhFOTlMOGd1SxcszXucEREkqakkEZlJV246JgDcYd7X1dtQUQ6HiWFNPvCyUGH8xNVy9lQXRNzNCIiyVFSSLPh/Uo4bUQZO3bt5qFJS+IOR0QkKUoKbaB+6Yv731zCjl11MUcjIhJd0knBzHqZ2VFtEUxnMfbgPozsX8LarTt5ZvrKuMMREYksUlIws5fNrLuZ9QamA/eZ2a/bNrSOy8walr6457VFmswmIh1G1JpCD3ffDFwM3OfuFcCH2y6sju+8ow+gX/cuzFu9hVfnr407HBGRSKImhTwzKwcuBZ5tw3g6jYK8HMadOAQIlr4QEekIoiaFm4HngQXu/raZDQPmt11YncNnjh9E1/xcXpu/lrmrNscdjohIi6ImhVXufpS7fxXA3RcC6lNoQc+iAi4dPQCAeyZqMpuItH9Rk8JtEY9JI1edNBQzeHraCj7YvCPucEREmpXX3JNm9iHgRKDMzK5PeKo7kNuWgXUWg/sUc/Zh/fnn7Pe5/83FfPvskXGHJCKyXy3VFAqAbgTJoyThthn4RNuG1nl88ZRg6YuHJi1lW01tzNGIiOxfszUFd38FeMXM/uTuWrOhlUYN6sUxA3sybdlG/lq1nM9+aEjcIYmINClqn0IXM7vLzF4ws5fqb20aWSey12S2iYuo263JbCLSPjVbU0jwOPB74G5Ai/m0wtmH92NAr64sXreNf81dzdmH9487JBGRfUStKdS6+53uPtndq+pvbRpZJ5OXm8NVY/fs4ywi0h5FTQrjzeyrZlZuZr3rb20aWSd06XEDKSnM4+3FG5i2bGPc4YiI7CNqUhgHfBt4A6gKb5VtFVRn1a1LHp85fhCg2oKItE+RkoK7D23iNqytg+uMrhw7hLwc4x+z3ueDanXPiEj7Eqmj2cw+19Rxd38gveF0fuU9unLuUeU8NW0ld0/dzIfH7iY/V3sdiUj7EPXd6LiE28nATcD5bRRTp3ftmcPp0TWfqlU7+caj0zREVUTajUg1BXf/z8THZtYDeLBNIsoCw8q68cBVx3PZH97g2RmrKMzP5ReXHEVOjsUdmohkuda2W2wDhqczkGxz9MCefP/kXnTNz+WJquX88JlZ2qFNRGIXdTvO8Wb2THh7DpgHPN22oXV+h5YWcPe40RTk5fDQpKX89Lm5SgwiEquoM5p/lfB1LbDE3Ze3QTxZZ+zBpfzhigqufrCSuycuoqggl+s/MiLusEQkS0UdkvoK8A7BCqm9gJq2DCrbnD6yL7dediy5OcatLy3g9n8viDskEclSUZuPLgUmA58k2Kf5LTPT0tlp9LEjy7nlk0djBr98fh73aqc2EYlB1Oaj7wPHufsHAGZWBvwLeKKtAstGFx57IDtr67jhrzP58bNzKMzP5TNjBsUdlohkkaijj3LqE0JoXRJl92FmPc3sCTN7x8zmmtmHwvWUXjSz+eF9r9aevyP71HGDuOm8wwD4/lMz+dsUdd2ISOZEfWP/p5k9b2ZXmtmVwHPA31O47m+Bf7r7SOBoYC7wXWCCuw8HJoSPs9KVY4fy3Y+NxB2+9fh0npuxKu6QRCRLtLRH88FAP3f/tpldDJwEGPAm8HBrLmhm3YFTgCsB3L0GqDGzC4DTwpfdD7wM3NCaa3QGXz71ILbX1PHbCfO57i9TKczP4cxD+8Udloh0ci3VFH4DbAFw97+5+/Xu/g2CWsJvWnnNYcAa4D4zm2pmd5tZMUHyWRVeaxXQt5Xn7zS+/uHhfOmUYdTudr7y0BRem78m7pBEpJOz5iZLmdksdz9iP8/NdPcjk76g2WhgEjDW3d8ys98Cm4H/dPeeCa/b4O779CuY2dXA1QDl5eUV48ePTzaEBtu2baOoqKhdl3d37p66hX++t42CXLjx5N4cXlaQseurvMqrfOcrP3r06Cp3H93kk+6+3xuwoDXPtXDO/sDihMcnE/RRzAPKw2PlwLyWzlVRUeGpqKys7BDl6+p2+7cfn+aDb3jWD/vBP3zKkvUZvb7Kq7zKd67yQKXv5321peajt83si40PmtnnCTbaSZq7vw8sM7P6abtnAnOAZwg28yG81zIaoZwc438vPorzjz6A6po6xt07mdkrN8Udloh0Qi3NU/g68KSZXc6eJDAaKAAuSuG6/wk8bGYFwELgPwj6Nx4LE85SgolyEsrNMW659Gh21tbx/OzVfPaeyfzopO5UxB2YiHQqzSYFd18NnGhmpwP1fQvPuftLqVzU3acRJJfGzkzlvJ1dfm4Ot376WL70YBUvz1vDz1/fwJkn1tKtS9Q5iCIizYu69tG/3f228JZSQpDUdMnL5c7LKxjRr4SVW+u48cmZWllVRNJG+0B2QF0Lcrn98mPpkms8NW0lj1UuizskEekklBQ6qIP7lnD1qO4A/PDp2bzz/uaYIxKRzkBJoQM7bUhXPlkxgJ21u7nm4SlU76yNOyQR6eCUFDq4my84nOF9u/Hemmp+8PSsuMMRkQ5OSaGDKyrI4/bLR1GYn8PfpqzgcfUviEgKlBQ6gUP6lfDfFwQjhn/w9CzeXb0l5ohEpKNSUugkPjl6IBePOpAdu4L+hW016l8QkeQpKXQi/33BERxUVsz8D7byo6dnxx2OiHRASgqdSHGXPO64vILC/Bwer1rOX6u0a5uIJEdJoZMZ0b+Em88/HIAbn5rFgg/UvyAi0SkpdEKXjh7IhcccwPZddVzz8FS219TFHZKIdBBKCp2QmfHTi45kWFkx81Zv4aZn1L8gItEoKXRSxV3yuP0zo+iSl8Ojlct4auqKuEMSkQ5ASaETO7S8Oz86L+hf+K8nZ/Lemq0xRyQi7Z2SQif36eMHcv7RB7Ctpo5rHp7Cjl3qXxCR/VNS6OTMjP+5+EiGlhbzzvtbuHn8nLhDEpF2TEkhC3TrksfvPnMsBXk5PDJ5Kc9MXxl3SCLSTikpZInDD+jBD889DIDv/XUGi9ZWxxyRiLRHSgpZ5PIxgzjnqHKqa+r46sNTqKnTNp4isjclhSxiZvzs4iMZ3KeIuas2c9Mr65m9clPcYYlIO6KkkGVKCvO54/JRlHYrYN66XZx320R++PQsNm3bFXdoItIOKClkocMP6MFL3zqNc4YXYWY88OYSTr/lZR59eym7d6tJSSSbKSlkqe6F+Vx1THeeu/YkxgztzfrqGm7460wuuuN1pi/bGHd4IhITJYUsN7J/d/5y9Qnc+ulj6de9C9OXb+LCO17ne3+bwfrqmrjDE5EMU1IQzIzzjz6ACd88jS+dOoy8HOORycs4/Vcv8+Cbi6lTk5JI1lBSkAbduuTxvY8dyj+uO4WTh5eyafsufvD0bM67bSJVS9bHHZ6IZICSguzj4L7deOCq4/n9FRUc2LMrc1Zt5pI73+T6x6bxwZYdcYcnIm1ISUGaZGZ89Ij+/Ov6U7n2jIMpyMvhb1NWcMavXuHu1xayq2533CGKSBvIizsAad+6FuRy/UdGcEnFAH48fg4T3vmAnzw3l8cql1FRCqvyVzKstBtDSosoKtCfk0hHp/9iiWRwn2LuufI4JsxdzY+fncO7q7fy7mp4ZPbUhteU9yhkaGlxw21YWTFDS7sxoFdX8nNVKRXpCJQUJClnHtqPsQeX8tyMVbw6YwHbcruxaG01S9ZVs2rTDlZt2sEb763bq0xejjGod1GYJIJEMaysGNOoJpF2R0lBklaYn8slFQMYwmoqKioAqK3bzYqN21m4tppFa6pZtHbPrf74wkYrsw7tmcc9Q7YyrKxbHN+GiDRBSUHSIi83h8F9ihncp5jTR+z93PaaOpasD5LFwrXVLFxTzZvvrWXRxh2cd9tE/ufiI7ngmAPjCVxE9qKkIG2ua0EuI/t3Z2T/7g3HNu/YxZfvfpU3lu/gur9M440F67jp/MPpWpAbY6Qiot4/iUX3wnyuP6EHP73oCArycni0chkX3D6R+au3xB2aSFZTUpDYmBmXjxnM09eMZVhZMe+u3sp5v5vIY5XLcFcntEgclBQkdoeWd2f8107i4mMPZMeu3XzniRl887HpVO+sjTs0kayjpCDtQnGXPH79qWP45SeOomt+Ln+buoLzfjeROSs3xx2aSFaJLSmYWa6ZTTWzZ8PHvc3sRTObH973iis2ic8nRw/kma+NZUS/EhauqebCO17n4beWqDlJJEPirClcB8xNePxdYIK7DwcmhI8lCw3vV8JT14zlsuMGUlO7m+8/OYuvPTKVLTu0ZahIW4slKZjZAOAc4O6EwxcA94df3w9cmOGwpB3pWpDLzy45it9edgzFBbk8N2MV5942kZnLN8UdmkinZnFUy83sCeB/gRLgW+5+rpltdPeeCa/Z4O77NCGZ2dXA1QDl5eUV48ePb3Uc27Zto6ioSOXbefmVW2r59aSNLNpYS57B544u4eMHF7F9+/YOEb/Kq3x7Kz969Ogqdx/d5JPuntEbcC5wR/j1acCz4dcbG71uQ0vnqqio8FRUVlaqfAcpv72m1n/41EwffMOzPviGZ/2L97/tL78xOWPXV3mV70zlgUrfz/tqHM1HY4HzzWwx8BfgDDN7CFhtZuUA4f0HMcQm7VRhfi43X3AEd14+ipLCPF6Ys5ofvbyejdu0j7RIOmU8Kbj799x9gLsPAS4DXnL3K4BngHHhy8YBT2c6Nmn/PnZkOX+/9mSGlRazeFMt4+57Wx3QImnUnuYp/Aw4y8zmA2eFj0X2MbB3EQ9/cQx9i3OZvmwjn/9TJdtqNNFNJB1iTQru/rK7nxt+vc7dz3T34eG9doqX/Srv0ZWbTu1FeY9CJi9ez9UPVLFjV13cYYl0eO2ppiCSlH7FeTz8hTGUduvCxAVr+erDU6ip1d7RIqlQUpAObVhZNx7+whh6FeXz0jsf8PVHp1Jbp8Qg0lpKCtLhjehfwoOfH0NJYR5/n/k+335iBru11adIqygpSKdwxIE9uP+q4ykuyOXJqSv4/lMztV6SSCsoKUinMWpQL+658ji65OXwyORl3Dx+jhKDSJKUFKRTOWFYH+763GgKcnP40xuL+cXz85QYRJKgpCCdzqmHlHH75aPIyzHufPk9fvfSgrhDEukwlBSkUzrrsH7836eOIcfglhff5Y+vLow7JJEOQUlBOq3zjj6AX3ziaAB++ve5PPjm4ngDEukAlBSkU/tExQB+cuERAPzg6dk8Xrks5ohE2jclBen0rjhhMDeecygAN/x1Bs9MXxlzRCLtV17cAYhkwhdOHsaOXXX86oV3+caj0+iSl0Np3EGJtENKCpI1vnbGcLbvquP2f7/H1/48hRMO7ELpgmmtPt+6dRvpk0J537aFBbuXMrS0G0NLiyntVoCZtfp8IumgpCBZ5VsfGcH2mt3c+/oiXlu6A5auSO2EKZZ/at7Mhq9LuuQxtKyYYaXFQaIIvx5SWky3LvpXlczQX5pkFTPjB+ceypmH9mXSjHcYOmRIq8+1aPHiVpev2+1UzV3IzoIeLFxbzcI1W9myo5YZyzcxY/mmfV7ft6QLQ0uLGVZWzNAwaRTu0sJ/kn5KCpJ1zIyxB5dSuKkrFaMGtPo8Vb46pfLD7AMqKo4Bgr3S11fXsGhtNQvXVrNobTWL1oT366r5YMtOPtiyk7cW7dlmpHuXHG7MW8YnRg0gJ0fNTpIeSgoi7YCZ0adbF/p068LoIb33eq5ut7Ny4/YgQYS3qiUbmLliE995YgZ/fmspP77gcI4a0DOe4KVTUVIQaedyc4yBvYsY2LuIUw4pA4KaxW+efJ0/z93BtGUbueD217nsuEF8++wR9C4uiDli6cg0T0GkAzIzThnclZe+eSpXnzKMXDMembyUM255mYcmLaFO+0lIKykpiHRgJYX5/NfHD+WfXz+ZsQf3YeO2Xdz41CwuuH0iVUs2xB2edEBKCiKdwMF9S3jo82O44/JRHNCjkFkrNnPJnW/wrcens2bLzrjDkw5ESUGkkzAzPn5kOf/65ql87fSDKcjN4Ymq5Zzxq5e5d+Ii7V0tkSgpiHQyRQV5fOvsETz/jVM4bUQZW3bW8uNn53DOrROZtHBd3OFJO6ekINJJDS0t5r4rj+Puz41mYO+uzFu9hcvumsS1j0zl/U074g5P2ikNSRXpxMyMDx/Wj5OGl3LXqwu5/d8LeGb6Sv41dzWj++dz3Mb5DG2YJV1MUYHeErKd/gJEskBhfi7Xnjmci449kJ88N4fnZ6/m1aV1vLr03b1e17974V7LaQT33RjQqyv5uWpYyAZKCiJZZGDvIv7w2dHMWrGJ596Ywe7i0oZlNZasq+b9zTt4f/MO3mzU95CXYwzqXdRQoxhaVsyu9Ts5cNMO+nXvotVdOxElBZEsdMSBPdg5rIiKikMbjtXW7Wblxh0sXLt1ryU1Fq6pZuWm7cHCfWur9zrPTa9MoKgglyF9ihNWeA1rGaXd6FGUn+lvTVKkpCAiAOTl5jCoTxGD+hRx2oi9n9uxq44l67axaO3WcFXXamYtXs2aHca66hrmrNrMnFWb9zln7+KCPYkiYVnwwX2KMvRdSbKUFESkRYX5uYzoX8KI/iUNx6qqqqioqGDTtl0sWhcs/92wymu4wuv66hrWV9dQ2Wh2tRnkG9iT/2h1TL57d0rlSwpg5NS39moSG1ZazIE9u5KXxf0nSgoikpIeRfkcU9STYwb23Ou4u7N68849zVFr9jRJLV2/jZrdDrtTnFCXQvmdtTBxwVomLli71/H83Pr+k24J+1cEne5l3Tp//4mSgoi0CTOjf49C+vco5MSD9t4Ru7ZuN5OrpjDq2GNbff4pU6e2urw7THizkuJ+Q8OO9j2Ja+WmHby3ppr31lTD3L3LdeuS15AkarZu5qlls1od/5o1qZXfsmELFRWtLr5fSgoiknF5uTl0yTUK83NbfY5Uy5d3y6NiZF9Ob3R8e01dQkf71obRWQvXVLNp+y5mrtjEzBXh7njvLWn19VMt37NL2zRxKSmIiCToWpDLYQd057ADuu/z3IbqmoYk8c6ChQwaNKjV11m6dGlK5VevWNbqss1RUhARiahXcQEVxQVUDO5FFaupqBjS6nNVFaxLrXxV26xjlb1d7CIisg8lBRERaaCkICIiDTKeFMxsoJn928zmmtlsM7suPN7bzF40s/nhfa9MxyYiku3iqCnUAt9090OBE4BrzOww4LvABHcfDkwIH4uISAZlPCm4+yp3nxJ+vYVgesiBwAXA/eHL7gcuzHRsIiLZLtY+BTMbAhwLvAX0c/dVECQOoG+MoYmIZKXY5imYWTfgr8DX3X1z1PVEzOxq4Orw4VYzm5dCGKXA2hZfpfIqr/Iq37nKD97vM+6e8RuQDzwPXJ9wbB5QHn5dDszLQByVKq/yKq/y2Vh+f7c4Rh8ZcA8w191/nfDUM8C48OtxwNOZjk1EJNvF0Xw0FvgsMNPMpoXH/gv4GfCYmX0eWAp8MobYRESyWsaTgrtPBPbXgXBmJmMB7lJ5lVd5lc/S8k2ysG1KREREy1yIiMgeSgoiItJASUFERBpkXVIws9KWX7VPmQfD++vSH5FIx2FmvczseDM7pf7WinN0NbMRbRFfM9fMNbOHMnnNJmL4hZl1N7N8M5tgZmvN7Io4Y2pK1iQFMzvPzNYQDIVdbmYnJlG8wswGA1eF/xS9E29JxNDPzO4xs3+Ejw8Lh+Am832MNbPi8OsrzOzXYWzJxHBueMvoUiIWuMLMfhg+HmRmx7fiPCea2WfM7HP1tyTK5prZAeG1B5lZUvshtuZ3aGYzzWzG/m5JXv+Q8A1lVvj4KDO7MWLZ68I3JQu/hylm9pEkrv0F4FWCiac3h/c3JRn/ecA04J/h42PM7Jkkyv88yrHG3L0OKDOzgujR7nOdc81sqpmtN7PNZrbFzDYncYqPuPtm4FxgOXAI8O0krp+ZpNIWM+La4w2YAYwMvx4DvJJE2WsJFu7bCSxMuC0CFiZxnn8AlwLTw8d5wMxWfB8GHB1+fV3U7yW89hKCBQcfCOP/RIRyW4DN+7slEfudwO0EExcBegFvJ/n9Pwi8AdwB3Bbebo1Y9j8JlgWYDcwMbzOSvH7Sv0OCJQUGA78Ib0eGt58BP0zy+q8AxwNTE47Nili2PuazCSaLHg1MSeLaM4FCYFr4eCTwaJLxVwE9GsUf+XfQVLxRywN/AN4GfgBcX39L4toLgKMIR20mewNmh/d/BD6a+DuJWL7+535R+D/cO5nyUW/ZtEdzrbu/A+Dub5lZSdSC7n4rcKuZ3Qn8HqivMr/q7tOTiKHU3R8zs++F5601s7okykPwfbiZXQD81t3vMbNxLZYKfB84zt0/ADCzMuBfwBPNFXL3kvD1PwbeJ3hjNuByIPLPERjj7qPMbGp43g2t+OQ2GjjMw/+OJF0HjHD3VDa3Tfp36O5LIKjlufvYhKe+a2avAz9O4vpF7j7Z9l4rrDZi2fpCHwfuc/fpZhEXHQvscPcdZoaZdXH3d1rRDFTr7puSuyyY2VeArwLDGtWuSoDXI55mZXjLIbm/23rLCBJwa8fxjzezd4DtwFfD/78dSZTPD+8/Djzi7uuT/TlGkU1Joa+ZXb+/x773khv78w7wEPA3gn+wB83sj+5+W8QYqs2sD+AAZnYCsCli2XpbwjekK4BTzCyXPX8sLcmpTwihdSTXhHi2u49JeHynmb1F8Ok3il1hvPXffxmwO4nrA8wC+gOrkiwHwT91sj/vxlL5HRab2UkeTOAkbMIsTvL6a83soITrf4LoP4sqM3sBGAp8L/xglMzPf7mZ9QSeAl40sw0Eb7LJmGVmnwFyzWw4QS38jQjl/kxQS/tf9t5rZYu7r49yYXe/GSD8vt3dtyYVOXwH+LuZvULQalB/3ijvHbj7d8Omrs3uXmdm1QRbBkSValKJJGsmr5nZj5p7vv4PpoVzzAA+5O7V4eNi4E13PypiDKMImjuOIHhzKyNovoncrmxm/YHPEDS7vBa2iZ/m7g9EKPsLgiaDR8JDnyKoet8Q8dpvEDT//IXgTenTwDXuHql/xswuD685iqD6+wngRnd/PEr58Bz/Bo4BJrP3P+b5EcreA4wAnqMV/9ThOep/h4cTNENF/h2aWQVwL0HzCcBG4CoP9xeJeP1hBDNZTwQ2EDQBXuHuiyOUzSH42S10941hcjswmb+/hHOdSvB9/NPda5IoV0RQY/0IwQer54H/dvdIb25hQlzu7jvN7DSC5pwH3H1jhLJHENRy6/sB1wKfc/fZEa/9ArCVoBmtIZm29N5hZme4+0tmdnFTz7v736JcPzxXL/YklWKgxN3fj1o+0jWyJSmkg5nNJGh+2RE+LiR4cz4yiXPkEbwxGcFKsLvaJNimr/1zgr0rTgqv/ypwQhJJYQjwW4L1q5yg2v71KG9ICecYSbCciRHstDc3iW+h/s1oH+7+SoSyTX4wiPKBIOEchcDXCNrltwBvArdFfVMLz9Gd4H+v1bWW8A0hx4ONqlp67ciwqWdUU88nk5TSKaw1FnvQ+Rq1zDSCJsQhBAnlGYImwY9HKPsG8H13/3f4+DTgf5L4UFPp7qOjxppQ7mZ3/5GZ3dfE0+7uV7VQvslkknCCyEkliqxJCmZ2a3PPu/u1Ec5xPcEKrk+Ghy4E/uTuv0kijhMJ/qAbmu4ifsqf6O4nmdkWwqaD+qeCU3j3COeY4u6jGh2bEbWmk6qwqWV2/RtZWI0/zN3fysT108HMHiPoYH84PPRpoJe7t7iAo5l1AS5h399/5D4FC4ZF30eQkP5IUOv6rru/0EyZu9z96rCW1Zi7+xlRr58qM/sz8GWgjj2dzr92919GLD8l7Jf6DrDd3W8zs6nufmyEstPd/eiWjjVT/mfAS839rNtCQjLpS1BDfCl8fDrwsrs3mzSSvl4WJYXEztibgb0+Nbr7/UQQftpq+KTt7lOTiOFB4CCCIXn1nZMeJSGlIrGTDngv4akS4HV3jzSsLWzD/CL7vqk1+0knofxUYFR9R13YnFHZOFHtp2w6kmIZQbvw4QSjaOrjj/ymmMobi5n9k6D/oYo9v3/c/ZZkr29mZwPXEIykuS/Kz7A9MLNp7n5M2JRYAdwAVCXRBPsW8BuCJqjz3H2Rmc1y9yMilH0SmELQhARBv9xod78w4rW3EPQB7QR2kcTfXlg+pQ8FZvYs8EUPd6g0s3Lg9nQnhazpaE580zezr0dNAk2cZwrBH1ZrpDJyJhUpd9KFngZeIxixlOyoKQg+hDR87+6+O2xOa5G7nxTet2bUSL2HgUcJxol/maDWtybJc0w1sxPcfRKAmY0h+uiXAe7+0SSv11hKI4haW1NNo3wzyyeoZf/O3XeZWTL/D/9B8Lv7aZgQhhIM/tgvM3vQ3T9L8Lc7hD0DRV4JzxdJin97EPz/1H8o2NnCa5sypD4hhFYTzHVIq6xJCo3EVT1KZeRMq4Vt15sImjpSURS1/2E/FprZtQTzFSCovSxMMaZk9PFgCO91YR/EK+FIkhaF/UlOMNLrc2a2NHw8GJgT8fpvmNmR7j6zNcGHWj2CaH81VYI5K5nyB2AxMB141YKJl5H7FNx9DsGIpfrHiwjmezSnfvLpOIImF2PPe0BSYzrDjt7h7F3TfDVi8VQ/FLxsZs8TDBRx4DKgqSbBlGRN81GiptrW2/h64wl+iSW0cuRMe2BmPwHecPe/t7J8X+BW4AyCn8cEgo7qD5otmCZmNsndTwj/sW4lGE75hLsfFKFss7PGPZyL0MI55gAHE4wY2sme5ofIfTqpjCAys7nEU1NtlpnluXuzcy3M7DF3vzQhOe+luZ9h+EHkKwTNpysSnwqK+rCIcX6BYK7LAILEegLB6MNIzY9mdhfBoIRWfygIO51PDh++6u5PNvf6Vl2jnf19tJmEtmgDugLb6p8iiXbBVl771PA6Pydo0254Cvi57z32v91KpU01HGlyf9T+i7ZgZucSNCEMJBhW2h24yd3HZ+j6TSaWKAml0Xla9WnVzB4Hrm3UBJFxZnYO+/brNNuubmbl7r4qlZ+hmd3p7l9JNt6E8jOB44BJYb/ISOBmd/9UC+VmEdTm8gh+bwtp5YeCTMia5qM0tAemcu1XAMws3xsNnTSzrvFElTx3L7Fgrae93pAilq0zszIzK/AkxrWn2SeBie4+Czg9/F5+BWQkKZCGZsv9fVolqH3tr0xiTXWOmcVWUzWz3wNFBM04dxPMVZncUrn6RJZsAm10jlYnhFBrZ3QfSFC7S0k4eu824FCgAMgFqtP9gTZrkoIF48u/TFB9nwHc21KVNY3XTscU/djt5w3pDaJvo7oYeN2CBdCq6w96EpPHUnSUJ0xy8mCZgBaHMqbRc+yprRYS9AvMI/jUHNV17Pm0enr9p9UWyvyKPTXVCxOO1x/LpBPd/ahwKPTNZnYLQcdvs5oYddbwFG1c00/Q2hndi1JJZgl+R9CP8DjBoJXPEbyfpVXWJAWCGbS7CJoPPk7wj3hdhq6drtE/cWvNG1KiVNeeSVWOmfVy9w0AYU0hY/8D3miSYzi8+UtJnibpT6vtrKa6PbzfZmYHECy1MrSlQnHW9BNiuCj88iYL5nz0IFzttQWNl9hpfN7IH4rcfYGZ5Xqw6ut9FkzIS6tsSgqH1f9TWrDcQYtV1nRJ4+ifuKW0IJonMXO4jdxCMALoCYJPnZcCP40rGHefYmbHJVks6U+r7aym+mwY/y8JhnY7QTNSh2BmJwHD3f0+C+a9HEgwcKA5uUA3khzp1IRtFiwgOc2CJWtWkfzaWS3Kpo7mvUYcZXoEUmcQTv75D+DrBG3YG4B8j7DEQFj+3zQ9ciSTM2oPI4i9fpmNqMNJ03HtxE+LOQSzkfu4+9mtPF+k9YfMrAfBMuXtqqZqwWSuQk9huY9MsmCZlNEEy2ocEtZ0Hve9V75tqlxa3mvCTvbVBP0J3yD43d/h7gtSPfde18mipFDHnnbsxBFImWyT7DSiviE1KlOR8LCQYHZnrbt/Zz9FOhXbe+2lWoI+lr96Eusmhedp/Gm1Wzhev92zYEG8bwKD3P2LFqyUOsLdn405tBZZsO7SsQR7OhwbHmtxmRiLuAxHxBgK2DNhrU3WTsua5iN3z407hs6kcdt0xDJVjQ69bhEnj3UGnvrSzXt9WiVYAymfYEZvs59W25H7CGb0fih8vJyg47TdJwWgxt3dwhnYFu6AGEHUgRjNsmABv/sJPkwYMNDMxiUxeS6SrEkKEj/be+vSHIK1b/rHFE7GWaOlm81sLTAuHCIb1UWEn1YB3H2lJbFhVDtwkLt/ysw+DeDu283aYKeYtvGYmf0B6GlmXwSuIliUsFlpbKK7hWBLz3kAZnYIwezmimZLJUlJQTKpij1DMmsJOuiS2qO6g7uLYPvHxKWb6/dGiKq1n1bbi5pwxFN9/AfRunWA4lBGsEvhZoKa2g+BD2fw+vn1CQHA3d+1YB2ptMqaPgWRuFmKSzeHr/8WweTBswg6jq8C/uzRd/+LlZmdBdwIHAa8QNDsdaW7vxxnXFE01WEcpU8hjde/lyCZ1q/yejmQ5+6RF/WLdB0lBcmU8FPNV9izx/XLwB/aorOsPbIUl25OOM9ZJOxc5u4vpjPOtmbBek0nEMQ/yd3XxhxSsyxNS8+nIY4uBMulJ26SdYe7p7WmpaQgGWNmdxN0jNYvW/5ZoM7dvxBfVG3PwqWbwyGpQ9jzT/0Kwdo5G+KMLxNsP7u+1fOYdn+Lor0O6W0rSgqSMeloPumILFgd9WMEW0c2Xro5UkdkO1nmodVs713fmtokKWNzVToa28/KsPXS3XyljmbJpDozO8jd3wOwYBP61mzW09H8nmA5hGFAZcLx+uTQ4tLN7WGZh1S4++nQsKzGVwlqS06w7MydzRSVYFOojFFNQTLGzM4A/sSejXWGAP9RPxqns7MUl25OOE/i5LVSoKQDTV5rao/rnu5+aXxRSSLVFCST+gBHECSDCwiGYnaIJQ7SIU0JofHktQI61uS1EY2aC/9tZtNji6YDsWCDnZ8DfQlqmW3SdJiTzpOJtOAH7r6ZYHObswiaVdR0kJyLgPMJl2xx95XEs+Jsa00N9wUASHaP62z3C+B8d+/h7t3dvaQt+pKUFCST6vsPzgF+7+5PE3zSlehqPGjz7aiT18YQrFS72MwWE2wQdKqZzWy0gqvsa7W7z23ri6j5SDJpRbhMwIeBn4fjrvXBJKJwOYhnW7PUQjuSysb12a7SzB4lWDY9cee8FjcpSoY6miVjwhUyPwrMdPf5ZlYOHOnuL8QcWodhZlOAG+jAk9ekdczsviYOu7tfldbrKCmIdBxmdjvwJ3d/O+5YJLPMrDDZZdZbdR0lBZGOI5wIdwiwhL33uc7I+jsSHzNbQLDJzmsES1y83hYbFCkpiHQg4e5b+/D0bAwv7ZyZDQJOJhiC/HFgo7sfk85rqKNZpAPRm3/2MrMBBMngZOBoYDYwMe3XUU1BRKT9M7PdwNvA/4TDudvmOkoKIiLtn5kdTbBm1CnAIGA+8Iq735PW6ygpiIh0DGbWjSAxnEywH4e7+5B0XkN9CiIiHYCZVQJdgDcI+hJOaYs+JtUUREQ6ADMrc/c1bX0dLTEgItIx1JjZr82sMrzdEu4Kl1ZKCiIiHcO9wBbg0vC2mWD59LRS85GISAdgZtMaT1Rr6liqVFMQEekYtoe77gFgZmOB7em+iGoKIiIdQDhP4QGgvh9hAzDO3dO6D4WSgohIO2Zm1yc+BOo3VqommKfw63ReT/MURETat/rtVkcAxwFPEySHKwhWS00r1RRERDoAM3sBuMTdt4SPS4DH3T2tu9mpo1lEpGMYBNQkPK4BhqT7Imo+EhHpGB4EJpvZk4ADFwH3p/siaj4SEekgzGwUwWJ4AK+6+9S0X0NJQURE6qlPQUREGigpiIhIAyUFkZCZfd/MZpvZDDObZmZj2vBaL5vZ6LY6v0hrafSRCGBmHwLOBUa5+04zKwUKYg5LJONUUxAJlANr3X0ngLuvdfeVZvZDM3vbzGaZ2V1mZtDwSf//zOxVM5trZseZ2d/MbL6Z/SR8zRAze8fM7g9rH0+YWVHjC5vZR8zsTTObYmaPh1suYmY/M7M5YdlfZfBnIVlMSUEk8AIw0MzeNbM7zOzU8Pjv3P04dz8C6EpQm6hX4+6nAL8nWHrgGuAI4Eoz6xO+ZgRwl7sfRbD+/VcTLxrWSG4EPuzuo4BK4Hoz600wDv3wsOxP2uB7FtmHkoII4O5bgQrgamAN8KiZXQmcbmZvmdlM4Azg8IRiz4T3M4HZ7r4qrGksBAaGzy1z99fDrx8i2HQ90QnAYcDrZjYNGAcMJkggO4C7zexiYFu6vleR5qhPQSTk7nXAy8DLYRL4EnAUMNrdl5nZTUBhQpGd4f3uhK/rH9f/bzWeCNT4sQEvuvunG8djZscDZwKXAV8jSEoibUo1BRHAzEaY2fCEQ8cA88Kv14bt/J9oxakHhZ3YAJ8GJjZ6fhIw1swODuMoMrNDwuv1cPe/A18P4xFpc6opiAS6AbeZWU+gFlhA0JS0kaB5aDHwdivOOxcYZ2Z/AOYDdyY+6e5rwmaqR8ysS3j4RoK9eJ82s0KC2sQ3WnFtkaRpmQuRNmJmQ4Bnw05qkQ5BzUciItJANQUREWmgmoKIiDRQUhARkQZKCiIi0kBJQUREGigpiIhIAyUFERFp8P+Yi1RsP/cksQAAAABJRU5ErkJggg==\n", "text/plain": [ "
    " ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "freq.plot(20, cumulative=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Eliminamos StopWords" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "['i',\n", " 'me',\n", " 'my',\n", " 'myself',\n", " 'we',\n", " 'our',\n", " 'ours',\n", " 'ourselves',\n", " 'you',\n", " \"you're\",\n", " \"you've\",\n", " \"you'll\",\n", " \"you'd\",\n", " 'your',\n", " 'yours',\n", " 'yourself',\n", " 'yourselves',\n", " 'he',\n", " 'him',\n", " 'his',\n", " 'himself',\n", " 'she',\n", " \"she's\",\n", " 'her',\n", " 'hers',\n", " 'herself',\n", " 'it',\n", " \"it's\",\n", " 'its',\n", " 'itself',\n", " 'they',\n", " 'them',\n", " 'their',\n", " 'theirs',\n", " 'themselves',\n", " 'what',\n", " 'which',\n", " 'who',\n", " 'whom',\n", " 'this',\n", " 'that',\n", " \"that'll\",\n", " 'these',\n", " 'those',\n", " 'am',\n", " 'is',\n", " 'are',\n", " 'was',\n", " 'were',\n", " 'be',\n", " 'been',\n", " 'being',\n", " 'have',\n", " 'has',\n", " 'had',\n", " 'having',\n", " 'do',\n", " 'does',\n", " 'did',\n", " 'doing',\n", " 'a',\n", " 'an',\n", " 'the',\n", " 'and',\n", " 'but',\n", " 'if',\n", " 'or',\n", " 'because',\n", " 'as',\n", " 'until',\n", " 'while',\n", " 'of',\n", " 'at',\n", " 'by',\n", " 'for',\n", " 'with',\n", " 'about',\n", " 'against',\n", " 'between',\n", " 'into',\n", " 'through',\n", " 'during',\n", " 'before',\n", " 'after',\n", " 'above',\n", " 'below',\n", " 'to',\n", " 'from',\n", " 'up',\n", " 'down',\n", " 'in',\n", " 'out',\n", " 'on',\n", " 'off',\n", " 'over',\n", " 'under',\n", " 'again',\n", " 'further',\n", " 'then',\n", " 'once',\n", " 'here',\n", " 'there',\n", " 'when',\n", " 'where',\n", " 'why',\n", " 'how',\n", " 'all',\n", " 'any',\n", " 'both',\n", " 'each',\n", " 'few',\n", " 'more',\n", " 'most',\n", " 'other',\n", " 'some',\n", " 'such',\n", " 'no',\n", " 'nor',\n", " 'not',\n", " 'only',\n", " 'own',\n", " 'same',\n", " 'so',\n", " 'than',\n", " 'too',\n", " 'very',\n", " 's',\n", " 't',\n", " 'can',\n", " 'will',\n", " 'just',\n", " 'don',\n", " \"don't\",\n", " 'should',\n", " \"should've\",\n", " 'now',\n", " 'd',\n", " 'll',\n", " 'm',\n", " 'o',\n", " 're',\n", " 've',\n", " 'y',\n", " 'ain',\n", " 'aren',\n", " \"aren't\",\n", " 'couldn',\n", " \"couldn't\",\n", " 'didn',\n", " \"didn't\",\n", " 'doesn',\n", " \"doesn't\",\n", " 'hadn',\n", " \"hadn't\",\n", " 'hasn',\n", " \"hasn't\",\n", " 'haven',\n", " \"haven't\",\n", " 'isn',\n", " \"isn't\",\n", " 'ma',\n", " 'mightn',\n", " \"mightn't\",\n", " 'mustn',\n", " \"mustn't\",\n", " 'needn',\n", " \"needn't\",\n", " 'shan',\n", " \"shan't\",\n", " 'shouldn',\n", " \"shouldn't\",\n", " 'wasn',\n", " \"wasn't\",\n", " 'weren',\n", " \"weren't\",\n", " 'won',\n", " \"won't\",\n", " 'wouldn',\n", " \"wouldn't\"]" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from nltk.corpus import stopwords\n", "stopwords.words('english')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Convertimos todo a minúsculas..." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "tokens = [x.lower() for x in tokens]" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "tokens_limpios = tokens[:]\n", "sr = stopwords.words('english')\n", "for token in tokens:\n", " if token in stopwords.words('english'):\n", " tokens_limpios.remove(token)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Extraemos los tokens más habituales..." ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "php::1\n", "hypertext:1\n", "preprocessordownloadsdocumentationget:1\n", "involvedhelpgetting:1\n", "startedintroductiona:1\n", "simple:1\n", "tutoriallanguage:1\n", "referencebasic:1\n", "syntaxtypesvariablesconstantsexpressionsoperatorscontrol:1\n", "structuresfunctionsclasses:1\n", "objectsnamespacesenumerationserrorsexceptionsfibersgeneratorsattributesreferences:1\n", "explainedpredefined:1\n", "variablespredefined:1\n", "exceptionspredefined:1\n", "interfaces:1\n", "classescontext:1\n", "options:1\n", "parameterssupported:1\n", "protocols:1\n", "wrapperssecurityintroductiongeneral:1\n", "considerationsinstalled:1\n", "cgi:1\n", "binaryinstalled:1\n", "apache:1\n", "modulesession:1\n", "securityfilesystem:1\n", "securitydatabase:1\n", "securityerror:1\n", "reportinguser:1\n", "submitted:1\n", "datahiding:1\n", "phpkeeping:1\n", "currentfeatureshttp:1\n", "authentication:1\n", "phpcookiessessionsdealing:1\n", "xformshandling:1\n", "file:1\n", "uploadsusing:1\n", "remote:1\n", "filesconnection:1\n", "handlingpersistent:1\n", "database:1\n", "connectionscommand:1\n", "line:2\n", "usagegarbage:1\n", "collectiondtrace:1\n", "dynamic:1\n", "tracingfunction:1\n", "referenceaffecting:1\n", "php's:1\n", "behaviouraudio:1\n", "formats:1\n", "manipulationauthentication:1\n", "servicescommand:1\n", "specific:2\n", "extensionscompression:1\n", "archive:1\n", "extensionscryptography:1\n", "extensionsdatabase:1\n", "extensionsdate:1\n", "time:1\n", "related:4\n", "extensionsfile:1\n", "system:1\n", "extensionshuman:1\n", "language:2\n", "character:1\n", "encoding:1\n", "supportimage:1\n", "processing:1\n", "generationmail:1\n", "extensionsmathematical:1\n", "extensionsnon-text:1\n", "mime:1\n", "outputprocess:1\n", "control:1\n", "extensionsother:2\n", "basic:1\n", "servicessearch:1\n", "engine:1\n", "extensionsserver:1\n", "extensionssession:1\n", "extensionstext:1\n", "processingvariable:1\n", "type:1\n", "extensionsweb:1\n", "serviceswindows:1\n", "extensionsxml:1\n", "manipulationgui:1\n", "extensionskeyboard:1\n", "shortcuts?this:1\n", "helpjnext:1\n", "menu:2\n", "itemkprevious:1\n", "itemg:1\n", "pprevious:1\n", "man:2\n", "pageg:1\n", "nnext:1\n", "pagegscroll:1\n", "bottomg:1\n", "gscroll:1\n", "topg:1\n", "hgoto:1\n", "homepageg:1\n", "sgoto:1\n", "search(current:1\n", "page)/focus:1\n", "search:1\n", "boxphp:1\n", "popular:2\n", "general-purpose:1\n", "scripting:1\n", "especially:1\n", "suited:1\n", "web:1\n", "development.fast,:1\n", "flexible:1\n", "pragmatic,:1\n", "php:107\n", "powers:1\n", "everything:1\n", "blog:2\n", "websites:1\n", "world.download8.1.5·release:1\n", "notes·upgrading8.0.18·release:1\n", "notes·upgrading7.4.29·release:1\n", "notes·upgrading15:1\n", "apr:3\n", "2022php:11\n", "8.0.18:2\n", "released!the:20\n", "development:20\n", "team:24\n", "announces:20\n", "immediate:20\n", "availability:21\n", "8.0.18.:1\n", "bug:9\n", "fix:10\n", "release.all:18\n", "8.0:7\n", "users:20\n", "encouraged:19\n", "upgrade:19\n", "version.for:23\n", "source:45\n", "downloads:24\n", "please:26\n", "visit:24\n", "ourdownloads:20\n", "page,:20\n", "windows:23\n", "binaries:20\n", "found:32\n", "onwindows.php.net/download/.:20\n", "list:25\n", "changes:20\n", "recorded:20\n", "thechangelog.14:3\n", "8.1.5:2\n", "8.1.5.:1\n", "8.1:11\n", "7.4.29:2\n", "7.4.29.:1\n", "security:11\n", "release:31\n", "users.this:1\n", "primarily:1\n", "due:1\n", "necessarily:1\n", "upgrades:1\n", "openssl:1\n", "zlib:1\n", "dependencies:1\n", "issues:5\n", "found.:1\n", "7.4:5\n", "thechangelog.17:6\n", "mar:2\n", "8.1.4:2\n", "8.1.4.:1\n", "8.0.17:2\n", "8.0.17.:1\n", "feb:3\n", "8.1.3:2\n", "8.1.3.:1\n", "8.0.16:2\n", "8.0.16.:1\n", "7.4.28:2\n", "7.4.28.:1\n", "thechangelog.21:2\n", "jan:2\n", "8.1.2:2\n", "8.1.2.:1\n", "thechangelog.20:1\n", "8.0.15:2\n", "8.0.15.:1\n", "dec:3\n", "2021php:15\n", "8.1.1:2\n", "8.1.1.:1\n", "thechangelog.16:2\n", "8.0.14:2\n", "8.0.14.:1\n", "7.4.27:2\n", "7.4.27.:1\n", "thechangelog.25:1\n", "nov:6\n", "8.1.0:6\n", "8.1.0.:1\n", "marks:1\n", "latest:1\n", "minor:1\n", "language.php:1\n", "comes:1\n", "numerous:1\n", "improvements:1\n", "new:6\n", "features:6\n", "as:enumerationsreadonly:1\n", "propertiesfiberspure:1\n", "intersection:1\n", "typesneverreturn:1\n", "typefirst-class:1\n", "callable:1\n", "syntax\"final\":1\n", "modifier:1\n", "class:1\n", "constantsnewfsyncandfdatasyncfunctionsnewarray_is_listfunctionexplicitoctal:1\n", "numeral:1\n", "notationand:1\n", "much:2\n", "more...take:1\n", "look:1\n", "thephp:5\n", "announcement:1\n", "addendumfor:1\n", "information.for:1\n", "thechangelog.themigration:1\n", "guideis:1\n", "available:5\n", "manual.:1\n", "consult:1\n", "detailed:1\n", "backward:1\n", "incompatible:1\n", "changes.many:1\n", "thanks:1\n", "contributors:1\n", "supporters!22:1\n", "foundation:1\n", "announcedthe:1\n", "foundationhas:1\n", "beenannouncedas:1\n", "entity:1\n", "funding:1\n", "work:1\n", "developing:1\n", "language.for:1\n", "information:5\n", "regarding:1\n", "structure:1\n", "purpose:1\n", "foundation,:1\n", "check:1\n", "post:1\n", "at:jetbrains.com.19:1\n", "8.0.13:2\n", "8.0.13.:1\n", "thechangelog.18:2\n", "7.3.33:2\n", "7.3.33.:1\n", "7.3:2\n", "7.4.26:2\n", "7.4.26.:1\n", "thechangelog.11:1\n", "rc:12\n", "6:2\n", "testingthe:4\n", "pleased:4\n", "announce:4\n", "8.1.0,:8\n", "6.:1\n", "sixth:2\n", "final:1\n", "candidate,:4\n", "continuing:4\n", "cycle,:4\n", "rough:4\n", "outline:4\n", "specified:4\n", "wiki.for:4\n", "thedownload:4\n", "page.please:4\n", "carefully:4\n", "test:8\n", "version:8\n", "report:4\n", "thebug:4\n", "reporting:4\n", "system.please:4\n", "use:4\n", "production,:4\n", "early:4\n", "changes,:4\n", "read:4\n", "thenewsfile:4\n", "theupgradingfile:4\n", "complete:4\n", "upgrading:4\n", "notes.:4\n", "files:4\n", "also:4\n", "archive.the:4\n", "next:4\n", "production-ready,:1\n", "general:1\n", "release,:1\n", "planned:4\n", "25:1\n", "november:2\n", "2021.the:4\n", "signatures:4\n", "inthe:4\n", "manifestor:4\n", "onthe:4\n", "qa:4\n", "site.thank:4\n", "helping:4\n", "us:4\n", "make:4\n", "better.28:2\n", "oct:5\n", "5:2\n", "5.:1\n", "fifth:2\n", "last:1\n", "candidate:3\n", "(rc:3\n", "6),:1\n", "11:1\n", "7.3.32:2\n", "7.3.32.:1\n", "fpm:1\n", "thechangelog.22:1\n", "7.4.25:2\n", "7.4.25.:1\n", "8.0.12:2\n", "8.0.12.:1\n", "4:2\n", "4.:1\n", "fourth:2\n", "5),:1\n", "28:1\n", "october:2\n", "better.30:1\n", "sep:1\n", "3:2\n", "3.:1\n", "third:1\n", "4),:1\n", "14:1\n", "better.older:1\n", "news:1\n", "entriesupcoming:1\n", "conferencesdutch:1\n", "conference:2\n", "2022:1\n", "-:1\n", "scheduleinternational:1\n", "berlin:1\n", "russia:1\n", "2022user:1\n", "group:1\n", "eventsspecial:1\n", "thankssocial:1\n", "media@official_phpcopyright:1\n", "©:1\n", "2001-2022:1\n", "groupmy:1\n", "php.netcontactother:1\n", "php.net:1\n", "sitesprivacy:1\n", "policyview:1\n" ] } ], "source": [ "freq = nltk.FreqDist(tokens_limpios)\n", "for key,val in freq.items():\n", " print (str(key) + ':' + str(val))" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAFCCAYAAAAezsFEAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAA/LUlEQVR4nO2deXzcVdX/3yfpmi7pSkkptKxVLAVJkX0TURQUFxT5iRZQeR5RARcEfVTE5RE3HhEVRBErCooIQpFVoOwCaUvZSilSCoXS0n1J0zbN+f1x7yTTbPP9ziQzSebzfr3mlZnvzPnek1m+595zz2LujhBCCAFQUWoFhBBC9BxkFIQQQjQjoyCEEKIZGQUhhBDNyCgIIYRopl+pFSiEMWPG+KRJk/KW37RpE4MHD5a85CUv+bKSnz179gp3H9vuk+7ea2+1tbVeCHV1dZKXvOQlX3byQJ13cF2V+0gIIUQzMgpCCCGakVEQQgjRjIyCEEKIZmQUhBBCNCOjIIQQohkZBSGEEM306uS1fLn3+WXMnLeU3QY2UFtbam2EEKLnUJYrhf8s38hNc1/j+ZVbSq2KEEL0KMrSKNSMGATAivptJdZECCF6FuVpFKpDvZCV9U0l1kQIIXoWZWkUxmdWCpu0UhBCiGzK0ijsMGwQlRXGmoYmNjfKMAghRIayNAqVFca4YQMBWLZ2c4m1EUKInkNZGgWAmhFhX+H1tZtKrIkQQvQcytcoVId9haUyCkII0UzZGoXxmZXCmoYSayKEED2H8jUKWikIIUQbytYoZPYUlmqlIIQQzZStURhfndlollEQQogMZWsUMqUu5D4SQogWus0omNnvzWy5mT2TdWyUmd1tZgvj35FZz33dzF40swVm9p7u0ivD6CED6F8Ba+q3Ur+lsbuHE0KIXkF3rhT+ABzX6tgFwD3uvidwT3yMme0NfBx4W5T5tZlVdqNumBmjB4chFIEkhBCBbjMK7v4AsKrV4ROBGfH+DOCDWcf/4u6b3X0R8CLwju7SLcPoqmAU5EISQoiAuXv3ndxsEnCru0+Jj9e4+4is51e7+0gz+yXwb3f/Uzx+FXC7u9/QzjnPBM4EqKmpqZ05c2be+l3yyEoefm0rZ00bzjG7VqWWr6+vp6oqvZzkJS95yZdSftq0abPdfVq7T7p7t92AScAzWY/XtHp+dfz7K+DUrONXAR/Jdf7a2lovhC9dfZ9PPP9W/7+7F+QlX1dXV9D4kpe85CVfCnmgzju4rhY7+miZmdUAxL/L4/ElwM5Zr5sAvN7dyozNuI+0pyCEEEDxQ1JvAabH+9OBm7OOf9zMBprZrsCewOPdrUxmT0FF8YQQItCvu05sZtcBRwFjzGwJcCFwMXC9mX0aeAX4KIC7P2tm1wPPAY3A59292xsdjKkKNnGpEtiEEALoRqPg7qd08NQxHbz+B8APukuf9hgzOOM+2oS7Y2bFHF4IIXocZZvRDFDV3xgyoJKNW7axbpMS2IQQoqyNgpmp2Y4QQmRR1kYB1GxHCCGyKXuj0FwtVWGpQggho6BqqUII0ULZG4XxarYjhBDNyChUa6NZCCEylL1RaHEfaaUghBBlbxQyK4Wlaxtoauq+irFCCNEbKHujMHhAJSOq+rOlsYmVG7eUWh0hhCgpZW8UAGqaVwvaVxBClDcyCsD4mMCmXAUhRLkjo4ByFYQQIoOMAtnuI60UhBDljYwCsFOmKN4arRSEEOWNjALZRfG0UhBClDcyCmSXutBKQQhR3sgoAOOGD8IM3ljXQOO2plKrI4QQJUNGARjQr4IxQwfS5LB8/eZSqyOEECVDRiEyXs12hBBCRiFDjZrtCCGEjEIGJbAJIYSMQjNqyymEEDIKzTSHpWqlIIQoY2QUImq2I4QQMgrNyH0khBAyCs2MHTaQfhXGig2b2dy4rdTqCCFESZBRiFRWGOOGBxfSG3IhCSHKFBmFLGrUbEcIUebIKGRRowgkIUSZI6OQxXiV0BZClDkyClm0uI+0UhBClCcyClm0JLBppSCEKE9kFLIYr7acQogyR0YhC7mPhBDljoxCFqOGDGBgvwrWNTSycXNjqdURQoiiUxKjYGZfMrNnzewZM7vOzAaZ2Sgzu9vMFsa/I0ugV/NqQWGpQohypOhGwcx2As4Gprn7FKAS+DhwAXCPu+8J3BMfFx012xFClDOlch/1AwabWT+gCngdOBGYEZ+fAXywFIqp2Y4QopwpulFw99eAnwKvAEuBte5+FzDO3ZfG1ywFdii2bqBqqUKI8sbcvbgDhr2CvwMnA2uAvwE3AL909xFZr1vt7m32FczsTOBMgJqamtqZM2fmrUt9fT1VVVXbHbvrP/X8Zs463jlpMJ8/oDq1fKHjS17ykpd8d8tPmzZttrtPa/dJdy/qDfgocFXW408BvwYWADXxWA2wINe5amtrvRDq6uraHLv3+WU+8fxb/dTf/Tsv+ULHl7zkJS/57pYH6ryD62op9hReAQ4ysyozM+AYYD5wCzA9vmY6cHMJdMtyH2lPQQhRfvQr9oDu/piZ3QDMARqBucCVwFDgejP7NMFwfLTYukHLRvPraxpwd4LdEkKI8qDoRgHA3S8ELmx1eDNh1VBShg/qz9CB/diwuZG1m7YyompAqVUSQoiioYzmdlCzHSFEuSKj0A5qtiOEKFdkFNoh02zndZXQFkKUGTIK7ZApdbFUEUhCiDJDRqEdxo9QW04hRHkio9AOarYjhChXZBTaoTn6SBvNQogyQ0ahHTJ7Cm+sbaCpqbi1oYQQopTIKLTD4AGVjKzqz9ZtzoqNm0utjhBCFA0ZhQ5oiUDSZrMQonyQUeiA8Wq2I4QoQ2QUOkBtOYUQ5YiMQgeoLacQohyRUeiAnTK5CkpgE0KUETIKHaBSF0KIckRGoQNUPlsIUY7IKHTAjtWDMIPl6xto3NZUanWEEKIopDYKZjbSzKZ2hzI9if6VFYwdOpAmh2XrlcAmhCgPEhkFM5tlZsPNbBQwD7jazC7pXtVKT3OzHe0rCCHKhKQrhWp3Xwd8GLja3WuBd3WfWj0DNdsRQpQbSY1CPzOrAT4G3NqN+vQoFIEkhCg3khqFi4A7gRfd/Qkz2w1Y2H1q9QzUbEcIUW70S/i6pe7evLns7i+Vw56Cmu0IIcqNpCuFyxIe61NkchW0UhBClAudrhTM7GDgEGCsmX0566nhQGV3KtYT0EpBCFFu5HIfDQCGxtcNyzq+Djipu5TqKYwZOpB+FcbKjVto2LqNQf37vB0UQpQ5nRoFd78fuN/M/uDui4ukU4+hssIYN3wQr63ZxBtrG5g0ZkipVRJCiG4l6UbzQDO7EpiULePu7+wOpXoS40cEo/D62k0yCkKIPk9So/A34Argd8C27lOn5xFyFVarLacQoixIahQa3f3ybtWkh6JmO0KIciJpSOpMMzvLzGrMbFTm1q2a9RDGV6vZjhCifEi6Upge/56XdcyB3bpWnZ7HeBXFE0KUEYmMgrvv2t2K9FTUbEcIUU4kMgpm9qn2jrv7H7tWnZ5HcwKb9hSEEGVAUvfRAVn3BwHHAHOAPm8URlb1Z2C/CtY3NLJhcyNDByZ9y4QQoveR1H30xezHZlYNXNMtGvUwzIzxIwazaMVGlq7ZxJ7jhuUWEkKIXkq+PZrrgT27UpGeTI2a7QghyoSkewozCdFGEArhvRW4Pt9BzWwEIRFuSjzvGcAC4K+ErOmXgY+5++p8x+hK1GxHCFEuJHWQ/zTrfiOw2N2XFDDupcAd7n6SmQ0AqoBvAPe4+8VmdgFwAXB+AWN0GZlmO1opCCH6OoncR7Ew3vOESqkjgS35Dmhmw4EjgKviube4+xrgRGBGfNkM4IP5jtHVaKUghCgXzN1zv8jsY8BPgFmAAYcD57n7DakHNNsPuBJ4DtgXmA2cA7zm7iOyXrfa3Ue2I38mcCZATU1N7cyZM9Oq0Ex9fT1VVVU5Xzf3jc18/8HVTN1hABce2ZLInVS+0PElL3nJS74r5adNmzbb3ae1+6S757wB84Adsh6PBeYlkW3nXNMILqgD4+NLge8Ba1q9bnWuc9XW1noh1NXVJXrdgjfW+cTzb/Wjf3JfXvKFji95yUte8l0pD9R5B9fVpNFHFe6+POvxSvKPXFoCLHH3x+LjG4D9gWVmVgMQ/y7vQL7otEQfbcoYLCGE6JMkvbDfYWZ3mtlpZnYa8E/gtnwGdPc3gFfNbHI8dAzBlXQLLTWWpgM353P+7mDYoP4MG9iPhq1NrKnfWmp1hBCi28jVo3kPYJy7n2dmHwYOI+wpPAr8uYBxvwj8OUYevQScTjBQ15vZp4FXgI8WcP4up2bEINYv28DrazcxcsiAUqsjhBDdQq6Q1J8TQkVx9xuBGwHMbFp87v35DOruTxL2FlpzTD7nKwY11YN5YdkGlq5p4G3jq0utjhBCdAu53EeT3P2p1gfdvY6QZFY2jFezHSFEGZDLKAzq5LnBXalIT6dGzXaEEGVALqPwhJl9tvXB6Pef3T0q9UwyEUhKYBNC9GVy7SmcC9xkZp+gxQhMAwYAH+pGvXocO2X6KqjZjhCiD9OpUXD3ZcAhZnY0oXgdwD/d/d5u16yHUaNmO0KIMiBpP4X7gPu6WZceTcZ9tGxdA01NTkWFlVgjIYToevLNSi47BvWvZNSQAWzd5qzYsLnU6gghRLcgo5ACNdsRQvR1ZBRSoBLaQoi+joxCCtRsRwjR15FRSIFWCkKIvo6MQgpaSl1opSCE6JvIKKRgfMxVeE0rBSFEH0VGIQXNpS6UwCaE6KPIKKRg3PBBmMHy9ZvZuq2p1OoIIUSXI6OQgv6VFewwbCDuIbNZCCH6GjIKKWmOQNJmsxCiDyKjkJLmXAVtNgsh+iAyCinRSkEI0ZeRUUiJmu0IIfoyMgopaclV0EpBCNH3kFFIScYoKFdBCNEXkVFIyfhqlboQQvRdZBRSMmboQPpXGqs2bmHzNi+1OkII0aXIKKSkosIYNzysFlbWbyuxNkII0bXIKOTB+BiWunKTjIIQom8ho5AHNTGBbUW96h8JIfoWMgp5kElgWyH3kRCijyGjkAd7jRsKwEOvNtCoaqlCiD6EjEIeHD+1hgkjB7NkXSM3znmt1OoIIUSXIaOQBwP7VXLeeyYDcMndL7Bpi9xIQoi+gYxCnrx/6nh2G9GPN9Y1cPUji0qtjhBCdAkyCnlSUWGcOnUYAJfP+g+rN24psUZCCFE4MgoFsO+4gRy+5xjWNzTyy/teLLU6QghRMDIKBXL+cW8B4JpHF/PqqvoSayOEEIUho1AgU3aq5oP7jWfLtiYuufuFUqsjhBAFUTKjYGaVZjbXzG6Nj0eZ2d1mtjD+HVkq3dLylXdPZkBlBf948jWefX1tqdURQoi8KeVK4RxgftbjC4B73H1P4J74uFew86gqPnnwRNzh4tufL7U6QgiRNyUxCmY2ATge+F3W4ROBGfH+DOCDRVarIL5w9B4MG9iPBxeu4KGFK0qtjhBC5EWpVgo/B74GZNeIGOfuSwHi3x1KoFfejBwygP8+ancAfnj7fJqa1GtBCNH7MPfiXrzM7ATgfe5+lpkdBXzV3U8wszXuPiLrdavdvc2+gpmdCZwJUFNTUztz5sy8damvr6eqqqrL5Dc3Ol+4401WbWri3AOrOXyXwUUdX/KSl7zkkzBt2rTZ7j6t3Sfdvag34IfAEuBl4A2gHvgTsACoia+pARbkOldtba0XQl1dXZfL/+XxxT7x/Fv9sB/d4w1bG4s+vuQlL3nJ5wKo8w6uq0V3H7n71919grtPAj4O3OvupwK3ANPjy6YDNxdbt67gI/tPYM8dhvLqqk38+d+vlFodIYRIRU/KU7gYONbMFgLHxse9jn6VFc0JbZfdu5B1DVtLrJEQQiSnpEbB3We5+wnx/kp3P8bd94x/V5VSt0I45q078I5Jo1hdv5Xf3P+fUqsjhBCJ6UkrhT6DmXH+e8Nq4aqHFvHG2oYSaySEEMmQUegmaieO5Li37UjD1iYuvUflL4QQvQMZhW7kvOMmU1lh/PWJV3lx+fpSqyOEEDmRUehGdh87lI8fsDNNDj+6Y0Gp1RFCiJzIKHQz57xrT6oGVHL3c8uoe7nX7p0LIcoEGYVuZodhg/jM4bsB8L+3zc8k8AkhRI9ERqEInHnEboweMoA5r6zhrueWlVodIYToEBmFIjB0YD/OPmZPAH58x/M0bmvKISGEEKVBRqFInPKOXZg4uor/vLmR6+uWlFodIYRoFxmFIjGgXwXnvWcyAP/3rxeo39JYYo2EEKItMgpF5Ph9ath3QjVvrt/MVQ8uKrU6QgjRBhmFImJmXPDetwLwmwde4vX1jWrGI4ToUfQrtQLlxsG7j+aoyWOZteBNvnjHCs675w4mjhrCpDFVTBo9hImjW+7vOHwQFRVWapWFEGWEjEIJ+N6JU/jGTU8z75VVrNvcxIJl61mwrG0ZjIH9Kpg4uioYitFVTBozJBqOKsZXd97VTQgh8kFGoQTsPKqKaz59ILNnz2aPvafyysp6Xl65kcUrN7JoRT2LV27k5ZX1rNiwmReWbeCFZRvanGNQ/wqO220wU/bdxsB+lSX4L4QQfREZhRJTPbg/+0yoZp8J1W2eW9+wlcUr61kcjcbLKzayeGU9i1Zu5M31m/nHgo288KtH+MUp+7HHDsNKoL0Qoq8ho9CDGTaoP1N2qmbKTm0NxuzFqznrj4/x3NJ1HP+Lh/jm8W/l1IMmYqY9CCFE/ij6qJdSO3EkPzt2NCfVTmBzYxPfuvlZPj2jjhUbNpdaNSFEL0ZGoRczuH8FP/3ovvzq/+3P8EH9uPf55Rz38we47/nlpVZNCNFLkVHoAxw/tYY7zj2Cg3YbxYoNWzj9D0/w7ZufoWHrtlKrJoToZcgo9BHGjxjMtZ85iK+/9y30rzT++Ohi3n/ZQzz7+tpSqyaE6EXIKPQhKiqM/zpyd24661B2HzuEhcs38KFfPcJvH3hJmdNCiETIKPRBpuxUza1fPJxPHjSRLdua+MFt8/nk7x/jjbUNpVZNCNHDkVHoowweUMn3PjiFq6ZPY/SQATz84kqOu/QBbn96aalVE0L0YGQU+jjHvHUcd5x7BEdNHsua+q187s9z+NoN89i4WaW7hRBtUfJaGTB22ECuPu0A/vjoYv73tvlcX7eExxet4pCaCpb2f725ntKwQf1LraoQosTIKJQJZsb0QyZx8O6jOfu6uTz/xnpeXgnXPjO3+TVjhg5gYjQQu44ewsQxLYX4hstgCFEWyCiUGXuNG8bNXziUmfOW8sC8hTT0G9ZcW2nFhi2s2LCF2YtXt5EbNWRAi7GI5b03rdrCHvVbqa6SwRCiryCjUIYM7FfJSbUT2JVl1NbWAtDU5Cxb38DLK2LxvZUbWZx1f9XGLazauIW5r6zZ7lwX3HMXI6r6M3H0EHbNlPmO/SAmjR7CiKr+qsckRC9CRkEAIcehpnowNdWDOXj30ds95+4sX7+ZRSs2Npf1fnnFRua/uoLlm5w19VtZU7+Gea+uaXPe4YP6seuYIc09IYLRCC6qTY1NBW14FyrvrtwNIVojoyByYmaMGz6IccMHcdBuLQZj9uzZ7L///ry5fnMwFLEnRPNqY8VG1jU0Mm/JWuYt6SCz+qY7C1OuAPnB/YzdHnkwrGrGZJoZhftjhw7UCkeUJTIKoiDMjB2GD2KH4YN4x66jtnvO3Vm5cQsvrwiri+xVxiur6tm8pZGKyvyjopu2NeUtv63J2dTYxLOvr+PZ19e1eb5qQGVwiUVjsWvcgJ80Zgg7DJPBEH0XGQXRbZgZY4YOZMzQgUybNKrN87Nnz27e08iHQuTdnVmP1lE9YY/tVzfReK2p38r8peuYv7StwRjcv5KJo6vwrQ0MeezhvMY3M0ZWNHDSwDc4dI/RCgcWPQYZBVGWmBnDB1aw/y4j2X+XkW2eX1O/pdlALIod74J7rJ5VG7fw/Buxp/bKNQXp8a9Fs+lXYdROHMlRk3fgyL3G8taaYVqJiJIhoyBEO4yoGsB+VQPYb+cRbZ5bW7+VV1bV89Szz/GWt0zO6/xbGp1bHnmGhRv6M+eV1Ty2aBWPLVrFj+54nnHDB3LkXmM5cq8dOGzPMVQP1ipCFA8ZBSFSUl3Vn32qqtmybAC1E9u6xZIyYM1QamtrWVu/lYdeXMGsBcu5/4U3WbZuM9fXLeH6uiVUVhhv33kER00ey1GTd2DvmuFUVGgVIboPGQUhSkx1VX+On1rD8VNrcHfmL13PrBeWc/+CN5m9eDV18fbTu15gzNCBHLHXGI6avAOVGxpp3NZEvwI264VoTdGNgpntDPwR2BFoAq5090vNbBTwV2AS8DLwMXdvm1orRB/GzNh7/HD2Hj+cs47ag3UNW3nkxZXc/8JyZi14k6VrG7hxzmvcOOc1APrdeQc7j6oKkVGZXJAxIbR2wsjB9JfBECkpxUqhEfiKu88xs2HAbDO7GzgNuMfdLzazC4ALgPNLoJ8QPYbhg/pz3JQdOW7Kjrg7C5dvYNaC5Ty4cAXzl6xixaYmFq0Im+Hw5naylRXGhJGD2802nzCyqjT/kOjxFN0ouPtSYGm8v97M5gM7AScCR8WXzQBmIaMgRDNmxl7jhrHXuGGcecTuzJ49m7dN3Y9XVoXcj8Ur61mUlUD4+tpNLF5Zz+KV9TzQ6lwVBv0rjIp/3JG3Pk1NTZIvofyw/vB4/hHdHWKlTPU3s0nAA8AU4BV3H5H13Gp3bxMraGZnAmcC1NTU1M6cOTPv8evr66mqyn/GJHnJ92T5Lduc5Ru3sXRDI0s3bOONDY28sWEbS9dvY0X9NpryHln0BKoHGr//wLi8ZKdNmzbb3ae191zJNprNbCjwd+Bcd1+XNC7b3a8ErgSYNm2alyr5SfKS783yW7c18XjdbN7+9rfnPf7cuXMlX0L5J+c+WdD3pyNKYhTMrD/BIPzZ3W+Mh5eZWY27LzWzGmB5KXQTohzoX1nBoH4VVA3I/xIg+dLKD+zXPaHJRQ9NsLAkuAqY7+6XZD11CzA93p8O3Fxs3YQQotwpxUrhUOCTwNNm9mQ89g3gYuB6M/s08Arw0RLoJoQQZU0poo8eAjpa9xxTTF2EEEJsjzJbhBBCNCOjIIQQohkZBSGEEM3IKAghhGimpBnNhWJmbwKLCzjFGGCF5CUvecmXmfxEdx/b7jPuXrY3oE7ykpe85MtRvqOb3EdCCCGakVEQQgjRTLkbhSslL3nJS75M5dulV280CyGE6FrKfaUghBAiCxkFIYQQzcgoCCGEaEZGocwws8FmNrnUepQCMzsnybEOZCvN7Cddr1VyzGxKicdvU86+vWNF0mVIKcYtB8rOKJjZjmb2ATN7v5ntmFLWzOxUM/t2fLyLmb0jgdzTZvZUR7cU4+9lZveY2TPx8VQz+2YK+fcDTwJ3xMf7mdktSeWjzI+SHOtA9gQzm2tmq8xsnZmtN7N1Kcb+sZkNN7P+8X1YYWanplB/ejvHTksi6O7bgFpL2je2Hczs0MzFLH6PLjGziSlOcYWZPW5mZ5nZiDx1qDKzb5nZb+PjPc3shITiX094rKOxC/38MLNDzOw5YH58vK+Z/TqFfN6fgZkNMbOKeH+veB3pn0b/QjGz3c1sYLx/lJmdne93oUO6IyOup96AzxAa+PwBmAG8DJyRQv5y4FeErnEAI4EnEshNjLcfx9s+8XYx8O0U498PvAOYm3XsmRTys4HqVvJPpXwP57RzLNE5gBeBqcSotzw+vyfj3w/Fz28UMC+B3CnATGA1ocNf5nYf8K8U4/8syn0S+HDmlkL+KUIvkX3j/XOA+1O+B3sCP4zv5bXAsSnl/wp8LfO9AQZn3tdOZN4LXAYsA36RdfsD8Hh3f36tzvEYsHMBv4G8P4P4+6kCdgJeBW4itBRO/V3OOuf8ePtC0veQ0AdnD+A/wP8BtxWiQ+tbSXo0l5DzgLe7+0oAMxsNPAL8PqH8ge6+v5nNBXD31WY2IJeQuy+O4x3q7odmPXWBmT0MfDfh+FXu/niryWpjQlmARndfm89k18w+B5wF7NZqdTMMeDjhaV4l/IDzjYPOzMreB1zn7qsS/i+PAEsJtWJ+lnV8PeHCkJRRwErgnVnHHLix/Ze3odHd3cxOBC5196vMrL3VS4e4+8K4OqwjXJjfHlcv3/CWfuedsbu7n2xmp8TzbUqw+nk9jvcBwoUxw3rgSynUz/fz2w53f7WV3LYU4oV8Bubu9bE75GXu/uPMtSBf3P2t8Tp0UEKRJndvNLMPAT9398sK1aE15WYUlhC+yBnWEy5USdlqZpWECwFmNhZoSiE/xMwO89B9DjM7BEjjG11hZrtnjX8S4WKXlGfM7P8BlWa2J3A24YKZhGuB2wmz1Auyjq9391UJz/E14DYzux/YnDno2/fq7oyZZvY8sAk4K77/DbmEolFeDByccJyOznN6IfLAejP7OnAqcET8LiV2P5jZVOB04HjgbuD97j7HzMYDj5LMOG0xs8G0fId2J+uzaA93nwfMM7Nr3X1rUn3bIa/PrxWvxt+NxwnZ2URXUkIK+QzMzA4GPgF8Oh5LfQ2N7qo93f1f8bPY4u7/TCi+NRr06cD747GudWF15bKjp9+APwJzge8AFwJzgCuALwNfTiD/CYL7YAnwA2AB8NEU49cC8whuq5cJS8H9U8jvBvwLqAdeAx4CJqWQr4p6P0GY+f0AGJTyPdwdGBjvH0X4UY5IKHsX4cJ1UXz/LwQuTDn+SKAy3h8C7JhA5qH4dz2wLuu2HliXYuwJBJfBcoIr5e/AhBTyO8bv2uHx8S7Ap1LIP0BwXQ1u57lPJjzHsQQ35JvAn+P38KiEsocSjNELwEvAIuCl7v78WsmPiXovi5/Dn4BRxfgMgCPj7//8+Hg34Bcp9f9s/P39Jz7eE7gnhfzehBXiKfHxrsAFaXTIdSurjGYzu7Cz5939ogTneAuhl7QRPsw0s5TMOYYTlqJr08pG+SFAhbuvz/nijs9RCQxx98QbvVHuSWAaMAm4k/Ajmezu70sgW+fu0/LQ9cOdPe/J3CYFY2Z3E1ZM18RDpwKfcPdjE8qfATzo7gvzHP9cd/95q2PnuPulKc+TcVcY8G93T1R+Oc7yv0RwITW7bDy6YxPIVxEuyLu4+5lxtTrZ3W9Nofuh7v5wrmMdyFYCd7r7u5KO19XE3887gMfc/e3x2NPuvk+KcwwmvIcLukXHcjIKhWJmBwHPZi7GZjYM2NvdH0soPxD4COGC2rzsdPdEewoWwievJsxwfwvsT5gl3JVQ/lrgvwk/6Mym8yXunjjU0szmeNhX+RqwyaNPM/MFzyF7MXBvUn2z5K6Od3cADgHujY+PBma5e6dGI+s8uwNL3H2zmR1F2PT+o7uvSSj/pLvvl+tYJ/LfBQ4jBB3MBh4EHvDgnkkiP8fd9291LNF730pmp6hD9nfwgQRyj7n7gWnGaiX/V8L//Sl3nxIvbo8mff/iOdp7D9oc60T+FsKqKvGEzMxmEt1t7eHuH0hxrsfc/cDM52Zm/QjBG1MTyr8f+CkwwN13NbP9gO+m0SEXZbWnYGZ7AV+l7UX5nR3JtOJywoU4w8Z2jnXGzcBawg+jUz9uB5zh7pea2XsIF8jTCUYi6UV2b3dfZ2afAG4Dzo+6pIm/z/g0P0V6n+bnga+Z2WZgK2Gm6u4+vDMhj758M7s1/g9L4+MaQjRYUv4OTDOzPYCrCKucawkbn0nIhFBeFx+fQth4ToS7Z0KZBxPcCOcBPwcqO5OL7/f/A3a17UOIh6UZP57rR8DJwLO07Ic5wTWVi/ss5GrcyPZ7QnMSDp/PJndG74MJE4KxZvblrKeGk+P9a0UD8HRc9W3MHHT3szuR+Wn8+2GC++lP8fEpBPdbGu43s28Ag83sWELwxswU8t8hrDRmAbj7k2a2a0odOqWsjALwN8Iewu9IF7GQwTxraeXuTdHSJ2WCux+Xx7jN48e/7wOudvd5SX9Ukf4W4qo/CPzS3beaWdql4umE1cYP3H1R/EL+KYcMAO4+LOVYrZmUMQiRZcBeKeQLjdw4A/glIQzQCZv0ZyQVthA1dCgwlLC39VXCaiEXXRU9BeGzn+zu+UxKMquEbBegs300Vmek3uTOYgDhfetHMIYZ1gEnJTwHwD/jLTHufj+AmX3P3Y/IemqmmSUxptlcQNikfhr4L8Lk7Hcp5NuLIOxad09XblD09Bswu0D5Gwkbq/3j7RzgHynkrwT2KWD8zKpgIWHTeFia/ynq/hrhi2gEF8KDRf4MRhJmOkdkbilkf0nYxziNEH1xOyE0MKn8Y4TZ3TPArvFYzhh34Efxb+Kggg7OMwd4nLDBfhQpN/m76P2/HRha7HHj2HlvcmedY2IX6DGYYBjTys0Hdst6vCsxZ6mI7+FVhFXjU4RN6suAK7pyjLLYUzCzUfHu2YQvZOvlb6KQSjPbgbDz/06Cdb4HONfdlyeUf46QdLIojp9xnyT1J1YA+xEiPtbEDcOd3D3tbDH7nP3cPWeug5ld7+4fM7OnaWdmkuR/MLPPEAzpBELk1UEEn3LSmWZm0/nw+PABd78phezehFXOo+5+XVzlnOzuF+eQe5rgInzME/quOznXMMK+wmHAx4Bl7n5YDpmH3P0wM1vP9u99IvdbPMdlUXYnQuLWPWz/G+jMfZI5xzjgf4Hx7v7e+H4e7O5X5ZLNOkdem9xZ8mMJoc1vAwZl6Z/oO1SIT97MjiNM7F6KhyYB/+Xud6bQv73fz1pCNOD3Pcemfdys/x/g3YT38E7ge+6eNrS34zHKxCgsInwQmTXXdv+0u++W4ByVwAx3T5WW3+ocE9s77jG5LeE5RhJmCNk/iMRLWDM7nrY/qJwb3WZW4+5LC/kf4g/iAMLFYD8LkVwXufvJSfUvBdGPfiYhhLKeeDEmxUU5nmcKwaAdSXDBvEpYqX27O/RuNXZnCVru7n9McI7bCavV/3H3faPrdK4njJyJbrt7PW7yWijPcJS7/yOJfJS5i5CV/VWCgZ8OvOnu5yeUn02Y1M3yPKJ/YrDIW+LD5z2lG87MfkxwXV8bD308/l0HHObu729XsIiUxZ6Cu+8KzRt8ZxFmaU7w516R8BzbzGysmQ1w9y35qpKnHNDxTJuEPl0zu4Lgdjqa4Mc8ieDOyIlHX34aA9YODe7eYGaY2UB3f95SFOezEP11GfBWgo+5EtiY66LcySon0UrN3c8DzjOzm939xKT6tsOPCBu6vyCUR0mUCJa10u1Iv5wrXXefEc91jrcKYbWERQGBMe5+vYXkLzzsz6TZm7swe2UXV7sXAv9IcY7RHrKQz/Hg67/fQjJkUgr1ye8JTCZMqvY1M5IY1CxaVzV42swedvdDrZM6UGb2c3c/t6NIqCQrnaSUhVHIYgbBIv8iPj4lHvtYQvmXgYdjBEh25ELSjNx/0jLDHETwSS4gzNyTcA4tM+2jMzPthLIAh7j7VDN7yt0vMrOfkbBEQzuui+anSD5bXhJnh/8A7jaz1YQSCkn5JWFm9TfCTPtTBHdcLjIXvasJRjBNFnszBRoE3P14C1m4ewGTzWxBQsMwm+1XutudlpBElZTpQOu8htPaOdYeG6P7J7NRfBDB9ZGU9gpwpr0GZd6vpXHV+zphkpSUvLP6owE7ipBAdhuhJtRDhKTYpAw1swM9hrFbKKg5ND7XmRs3kxvz005e0yWUm1GY7O77Zj2+z8wSxYhHXo+3CraPgEhE6yWqme1PiEBISkEzbUJ5AYB6C6URVhIMU0688Mgh3P1D8e53zOw+Qp7EHSnP8aKZVXqoWnq1meX8QXtLxNIw4DfAKuAvwA3uvizN+K0xsyvd/cyErz2ScAF5mXCB39nMpudy/2VWugXq2RVhrV8mhPHubqFm11jSRf7UmdklhDBiB77I9rWUkvB9M6sGvkJYNQ4nXf2lLxJ88psJocV3At9LKHsSYT9mrrufHvdY0kQOQSjK+XszG0r4DqwDPmMhIfWHHQm5++zowv5sIS7sJJSbUZhrZge5+78BzOxAkhdzwxNkPKfBQ92aA1KIFDrTvjXK/4QQCeOk/1IXhJkdRqj7cnXcNNyJsPGehPo4034y+maXkqJ2VPz8LrJQQ+hkguthiReW4fqbFK+9BHi3x0xUC3kz1xHKn3SImb0lTgDa3eT2ZHkCBYe1xu/rkQT3iQFJVzoZvgh8i7AnYIRIus+nkMdbsp/XEtygqXD3eoJR+B9ryepPukm7yUMYeqOFqgTLSbdKw92fAPaJhs18+8TJ63PIdoULOydlsdGcwczmE77Qr8RDuxDCzJpI4FuOs9v2/HlJffrZSTcVhIiW0e7+niTyrc51JHGmnc8XJG6YDfI8S23kQ1x+TyOs2PaKq5W/tfKxdiY/kZCbMIAwO6wGfu3uL6bUY0fgowRX1LBcn3tXEd12U3Mda0fuSg9lIe5r52lP+v0rlHgRPZ62yZ9J3aeFjP01D1VJM1FU2+EJoqfiefLO6rfQt+EbhO/NV4ANhHLgqQolWp7BHlH2N4TrRr4u7JyU20qhkMQxCBEPGQYRSlakKV2d7YJpJOwx/D2NAoXMtC2Es32FUDflsxaaBB3uKWrPFMiHgLcTVim4++sWQjQT4e6L40phF8L7lmqmaqH898kEt8cNhKX4cynk9yJkIbcuEZH0olxnZlfR4h/+BAncJxn3lLunnhlnsHbqR3msGWVm73T3e9tKtWEmMSOYdNWBMzoUUlEgU2OsLu24rcg7q9/dz4p3rzCzO4DhnjIcvJBgj0hBLuwklNVKoTsws/vd/ciUMsMIM7wNKeUKnWkXXHumEMzscXd/h7XUTxoSx0+ap3EULc2RjNBsJadPPkv+YuAv7v5kHuoT95+uoG1BuER+8bg6+zwh+s0IkUi/9oRhjWY2iHai55K4P6ylflSW2n5GfO4id++0WGR8Xc5VTQ75gt6/rsDMniXk+lxLyOq/P83/ZWYfICRdQmjOk6ZERfN7mPV3KHCju787zXm6k3JbKRSEbR8aWEHwBSdu6WkhTv0aQrMWzGwF4aL2TMJTFDTTpoDaM13E9XH5O8LMPksoEfHbFPI/Iw+ffAZ3vyD3qzql0d0vz1c4Xvwvibd8+CNhD+Cy+PgUwvcpZ5/kzlwcSQxC5HYze7enLGiYRd7vn3VdUborCCvrp4AHoksykQs1TioOIGRjA5xtZoe4e+KWpBQQ7BF1KCh5LwkyCunIDg1sJHy5Pt2pxPZcSejbcB80z3yvJBT6SsIWd3eL9YosffPyQmrPdAUZt806wt7Ot4E0m7z9PatcsLu/YMXtkTvTzM4i9FRInBFvHWSBZ8knnX0XGj1XKP8GbrKQWZ+4oGEWeb1/ka4qSjeKlonItwiTu1kJZd8H7OfuTQBmNoNQwyqNUcgEe/yYFtdhmmCPPxM26k8gK3kvhXxO5D4qImY2r9WPut1jnch/lZA8cywhfO0M4Fp3v6xTwRb5Y4FvEuKs7yIUZzvN3Wcl/icKwNove5xm6f57wsU12yffL+1GX75YyIxvjXuOjHhryQLPRNpk61+fYpPxDwR3UXb03PQsX3dnsh3lmQCQ5MJuZi8RCuo97XlcOPJ9/1qd4wHfvihdu8c6kf9K1sNBhIvr/IwrLYfsU4QM7FXx8ShCZnRil1qclH2OkNmecQFenjQCysxmu3tt9u8mHxd2p2PIKCQnzko/R4tPcRbwm6SbnWZ2E8H1k92kZZq7fzCFDseSVffE3e9OKhvlC6o9kw+W1d+Z0Gw8wzDgYU8Yd12oT77UWMxczXWsHbnMSqM/LdFzTtjwfs7dp6TQ4bvAG4TvoBEM0zB3/3EC2TuB92ZmyqUgRhAe7+4vxce7EhrXvzXP8w0EbvEEEYDR7XoxcB/hvTsC+Lq7/yXFeNcTXIDZK50R7p4ogdbM/u3uB8XP4heETecb3H33pDrkHENGITlm9jvCD3NGPPRJYJu7fyaH3DXu/kkLIamTaLmo3U+o/bO6+7RuTpLrEE9eDz/f8asJ1VEL6e9ccrpgUvAk8AXfvkf3r3Nt9FsH9aYyeLraWW0a5bR3rAPZPxAM++3k0WPbzD7V3nFPUSbCuqAoXavzjQQed/c9E76+hrCvYIQCiW+kHK9Qb8EJhNXFzrQk713k7rd0KpgC7Smk44BWH969CX26tfGHPZ0QipYpqEa83ymdLP2T+nSzk5Xa1P4heT38vPCQC7GWMCtKTRf65AvlcsKk4Nfx8SfjsU4nBVl8mpDNWh0fryFBP4bWF30L1XoHdfDyXGyL4Zh/Ibynp5C8t8iieBsQb2nJTtQcRGhrO4cUZSLc/Q4L5SnyKkrX6rtUSdjn6tR9186kakn8O97MxqecVBWaQFtQ8l4StFJIgZnNIdTU/098vBth6dbpTNzMzibMMHcj9DNofoqUPtVCsPYLAib2Z5aKrpwpF6hHQbO8LJm8enTHcMifAeMJ2bQTCf7wpLWzMLNJhDpHhxK+Aw8Tyr+/nOIcwwnf27x7hMfzVAPXpIgcyuTafJnQV+GzlrLPc6vvUiOhdHmnuUbWftJgBk8T+WOFJ9COJXTtm8T2uR6Jmz3l1FFGITlm9k7gD2y/dD09E02UQP5yd/9cgTpkJ6+NIfiDkyavXU+I/MmE1KXyZ5Y7+U4KsuQL7dE9j7Cq+5eH/r5HA6d4wtpLhWJm0whFBTNh0GsJLWLzyjOI7rin0uwHWIlzbQql0AmOhVpfD9I21yNVEmxnyH2UjtHAFMKP+kRCKGni2V4XGITm5DXCj3MAYcMqUfIapQ9pLAgLWbk/IvSnNtKHRBbKeYT37KU49kRCe9Kk3ExhPbq3uvtKM6swswp3v89Cz+XEWMjtuBwYFy+qU4EPuPv3E4j/HjjL3R+M5zqM8D1MGj2WnWtQSSiB3mm9n3Yoaa5N3AeaxPZGPY37q9BVbZUn7B2RLzIK6fiWu/8tLp+PJSzlL6eld213U2jyWkH+zB7Aj4H3u/v8nK/sBtz9noy7gmAU0jZZKbRH9xoLGbAPAH82s+WkK7MCIUb/PGIhP3d/ykI9oCRGYX3GIETZh+J+V1Kyyz43AovdfUlHL+6AkuXamNk1wO6EXiaZWbqTrnR2odxqZu9z99u6awAZhXRkvgjHE+LFbzaz7xRx/EKT1w4EPmVm2/kzM5tvRdywzZdlpTAIFmsDWdv6QbtbaLKSqCcF8IiZ7ePuT+epyomE2kNfIoSSVpNjk7Qdqtz98VaT66SG5XELGenXES6GJwOzMhuxuTZcPZSUGEfLhvPCVJoHLiSUW9/ZzP5MzLXJ4zz5MI1QO6mUPvdzgG+Y2WbySyDMiYxCOl6LP4p3AT+KPuL2God0OXGJfKsVViai0IKApaYu+pT/wfYhkUkvyvlyJHAv0F6rRCdhoyLCBv9pFpK4UvfodveNWQ9ndPjCzlkRZ9eZicVJhJLaSdgv/m1dFuMQEkSxmdnHCIXnZhH+98vM7Dx3vyHh+Lj73XFvJ5Nrc44XIdcm8gwhmzrp+9XleBf0NcmFNppTECMfjiNkdC6MMcv7eP61YNKOP4dQ1THv5LXejLUt6gZZhd16Oh1tMibYXOyKrneZc+1GS2mV1YQQ01PTRB/lS9y/Otbdl8fHYwmb5mmjt6bS1q/f3RODTBTSfoSqptmTki5rhdnJ2F3RUyPZWDIKvQcz+xXwBw+NOsoOMxtUyvBZC72MryZkpP6WUNf+glyTAjMb7qFcc7u9lr0ECXzR9ViRJqw0hpBeSFaVUOC7SUNrzexpz+o+aKGG0jxv1ZEwxzl+T9jYfpaW8t1FmRhY6GHSBg+9ort77OyeGm1yjdKExeYcS0ah92BmzxH6+y5m+wYbPX0voEswsxcJTXYeJGy2Ppw21r/A8ee5+75m9h5CuY1vAVcnyFO51d1PiG6jTEHFDDnzVLrSqORr2KLs3wkulOyM/n3dvU2vhg7kf0K4oF8XD51MCElNHE1jZs+5+95JX9+VmNkZwIPuns9eSFfp0O25RjIKvYh83Q99CTPbhVBM7FBC1co1xYpRt5Ya+JcSCqHdZGZz3f3tCeWvIRizB939+RTjFmRUWp0rL8MWZZ9s/V63dyzHOT5C+OwMeMDdb0oqG+WvAn7mKZojdRUW6kYdRghFnk24ID/oefbnyFOHbs810kZzL6KcLv7tYWYTCBeUwwkN1J8FHiqiCrPN7C5C/fuvx3DgNMXhriZcVC6Lvv25hIvKpZ0JufsJ8W/iuvudkDEo7yMYg3kp4vw3mdlh3lK76VBa+gMkwkOSVSGJVjOAR83sDfLYrC8Ed/82NM/WP0sI7f05IeeiWHR7rpFWCqLXYGZNwBPA/7r7zUUe24AJhFo5L7n7GgsVZ3fyFC0ZLfQ5PoBQt+a/Cc3g39K51HbyIwnl07MbrCTqPBflrya0cN2VYFgrCauenI2KzGw/wkW5mnAxXkUovZ7oomRdkHwYXYhfplVL0GJMmMzsm4RJyVCCQX+IYNSLFo1kBZRPTzyGjILoLZjZvoSZ9hGEHIuFhJaIVxVp/NlJLp6dyN8DDAEeJbgeHspE4iSU/wwhTn0CIYHqIEKJhzS1dyoIETSFGLbhAO6+LqlMlHuRApMPzezertxUTTn2HFp6q99PKD1flMAH68Ly6bmQ+0j0GqKr4z+EngyHE/pRHAEUxSgA/zazAwqI/nqK0Dp0CqHcxRoze9Tdk7pgziGsMv7t7keb2VuAi9Io4O5NcW9iLws9n3NioeR7e8cz50zaXrQrkg+ft5CBPZPi5qrgoa/4MMLE5Fjgt2a2zN0P6+6xCc2AioKMgug1mFkdMBB4hLB0P6LI+yxHA/9tZi8Tor/SJp99CcBCqYrTCXsMOxL+pyQ0uHuDmWFmA2Pc+uQ0/0BHqw06TzzrqoSprkg+HBxlsxvdp0kgzBsLPdYPJyQzTgNeJaz4up1ifs/lPhK9BjMb6+5d2o825fgFRX+Z2RcIF5VaQlhxJhLp3oTyNxGMybmEi/hqQt/q9yWRj+d4mpbVxn6Z1Ya7n5z0HPnSB5IP/0n8zIAnPGFzpd6GjILoNRSaPNVFOmSXLh8LDPXkpcvPI1xUZnuOGv4JznUkYcP3DnffkkLuCXc/wEIXuAPdfXPSsFIrrMJql2Ch/eYXaZvR3O1ZxXH8AYRcIYAFfdEwyCiIXkOhyVNdMH5z6XJ338vMxgN/8xw9lrtw/EuBv7r7IwWcI+/VhpndT6ywmsnNMLNnkm5yWhc0iInhl1fRNvqoGFnFRxIqor5McB3uTIj8SRz91RvQnoLoTezu7h/JenxRnPEWi0JLlxfKHOCbccZ+E8FA1KU5gbt/KN79TiyZUE2oOpqEQiqsQugn8SDwL5K3AG1Ng7v/Ik/ZQrkEeLe7L4DmldN1BHdgn0FGQfQmCk6eKpBCS5cXhLvPAGbEchcfIVTq3cUTNp3P0I4LbCdCYbxcFFJhFbqmQcylccV2F9tvVndZQbhO6J8xCHHMFyx0j+tTyCiI3sR/A3+0lsb3q4HpRRz/eiusdHlXsQehcf0kIFW5B2vbva8/ybv3fZ5QYfUtZvYawZB8IsXwXdEgZh+C2/CdZBXEI0fZ7i6iLpbZuCY+/gSh3EWfQnsKosfTKk7eCAlgEMJCPUWcfFfosZyQCQxwlxexdLmF1psfJuRp/BW4yd3XpDzHk0QXWNa+wFNJwmrNrNLdt1keFVaj/HrCZ5d3gxgzex6YmmZzvauw0D/l84Q8BSMEDfza03Xf6/FopSB6Axm//WRCOOXNhB/lqYQfZjH1+DShvMNfCMloxWQRoQ/CboTchqkWOr+leQ8KcYEtMrM7CAYpURhtNu4+LLq+tivTkZJ5wAiCcS4q8eJ/Sbz1WbRSEL0GC8XoPpKZocZN3r95YX2P89FjKqHs80eAJe7+riKN+1ngbAorc/FVwkX5WOCHBBfYte5+WQLZwYTucx8nlNy+FfhLZo8ngXx7iXOPuPsxKfSfRSi//QRFanSTVWKiXZImL/YWtFIQvYldgGy3wRaCX73YLAfeAFYSirsVi7MpvMzFT83sWEL55cnAt5O6wGI5jusJeysjgUsJuSJJq4QWXKaDtq1Ai0GmxMTn49/sPYX64qvTvcgoiN7ENYTm8TcRZm4fIv9exakxs88RVghjgRuAz3px6/oXXOYCQp9jIK+9kBirfzLwXsJsPU0d/4L1L0Y+QjtjLoYQ7dYqJ+UCM3sY+G6xdepOZBREr8Hdf2BmtxNKRQCc7u5zi6jCROBcL2JTlVYsMbMRhNpBd5vZauD1JILWBX2eYyG9JwmrhfPcfWPnEm0oRP+H3P2wdv6P1JvVBTCkVUj0IbQEPfQZtKcgRC8k3zIXBY453FOWy+7kXEXXv1DMbH9CGG81wTCtBc4oUo5E0dBKQYheSCFulFbJa2OAYQnrN+0YXXcF1z4qhRuoECw0RzrSQyvT4YQJddFqbhWTilIrIIQoHjF57Xzg6/HQAELyWhJ+G+W2AnhozPPxrtaxJ+Lu24AT4/11fdUggFYKQpQbhdRvKrT2UW/nYTP7JSFPo3k/Re4jIURvppDktUJrH/V2Dol/s6ONilVio2jIKAhRJliY4t9aQP2mQmsf9Wrc/ehS61AMFH0kRBlhofn8+YR2lgbcmSt5zdr2aB5M2I/cCKl6NPdqzOzb7R13d+UpCCF6LY8Ca9z9vBQyHdWe+iTFrT1VarLzMgYRMp3nl0iXbkMrBSHKCDN7jtBOcjHbb5YmqZLaI2pP9RRi1dRb3P09pdalK9FKQYjy4r0FyPaU2lM9hSpCxdo+hYyCEGVEpo5PnpS09lSpaVUttYJQDPF7pdOoe5D7SAiRmFjqIVN76oEi154qKWY2ERhJ+P9HALe5uzqvCSFEOWJmZwOfBW4kbLR/EPhtkl4UvQkZBSGESICZPQUcnKkOGxP/Hu1rTXZU+0gIIZJhwLasx9visT6FNpqFECIZVwOPxY12CO6jq0qnTvcg95EQQiQkbrQfRlgh9MmNdhkFIYQQzWhPQQghRDMyCkIIIZqRURAiYmb/Y2bPmtlTZvakmR3YjWPNMrNp3XV+IfJF0UdCAGZ2MKHq5f7uvjn2Lh5QYrWEKDpaKQgRqAFWuPtmAHdfEVtVftvMnjCzZ8zsytioJjPT/z8ze8DM5pvZAWZ2o5ktNLPvx9dMMrPnzWxGXH3cYGZVrQc2s3eb2aNmNsfM/mZmQ+Pxi83suSj70yK+F6KMkVEQInAXsLOZvWBmvzazI+PxX7r7Ae4+hdBc5oQsmS3ufgRwBaHHwOeBKcBpZjY6vmYycGXMel0HnJU9aFyRfBN4l7vvD9QBXzazUYSCc2+Lst/vhv9ZiDbIKAgBuPsGoBY4E3gT+KuZnQYcbWaPxQqZ7wTeliV2S/z7NPCsuy+NK42XgJ3jc6+6+8Px/p8IMe7ZHATsTWgK/yQwHZhIMCANwO/M7MNAfVf9r0J0hvYUhIi4+zZgFjArGoH/AqYC09z9VTP7DqHjVobN8W9T1v3M48xvq3UiUOvHBtzt7qe01sfM3gEcA3wc+AJ9rEG86JlopSAEYGaTzWzPrEP7AQvi/RXRz39SHqfeJW5iA5wCPNTq+X8Dh5rZHlGPKjPbK45X7e63AedGfYTodrRSECIwFLjMzEYAjcCLBFfSGoJ76GXgiTzOOx+Ybma/ARYCl2c/6e5vRjfVdbG9I4Q9hvXAzWY2iLCa+FIeYwuRGpW5EKKbMLNJwK1xk1qIXoHcR0IIIZrRSkEIIUQzWikIIYRoRkZBCCFEMzIKQgghmpFREEII0YyMghBCiGb+P8kVVSrFyj3jAAAAAElFTkSuQmCC\n", "text/plain": [ "
    " ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "freq.plot(20,cumulative=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.Leer pdfs, docs y txt con Python y generar un corpus" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Es necesario instalar para este caso, la librería PyPDF2 y la librería word" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#pip install docx2python" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#pip install PyPDF2\n", "## pip install word\n", "## pip install pdfminer.six" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#pip install pdfminer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Importamos las clases que nos permiten leer pdf y docs" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "import os\n", "#import PyPDF2\n", "import pdfminer as pdfm\n", "from docx2python import docx2python\n", "#from PyPDF2 import PdfFileReader\n", "from nltk.corpus.reader.plaintext import PlaintextCorpusReader" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generamos una función que nos devuelva el contenido de texto de un fichero de texto." ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "def obtener_texto(nombre_txt):\n", " archivo = open (nombre_txt, 'rb') # apertura del fichero en modo binario\n", " return archivo.read()\n", "ruta = %pwd" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "b'Five months.\\n\\nThat\\'s how long it\\'s been since Mass Effect: Andromeda launched, and that\\'s how long it took BioWare Montreal to admit that nothing more can be done with the ailing game\\'s story mode. Technically, it wasn\\'t even a full five months, as Andromeda launched on March 21.\\n\\nSEE ALSO: \\'Mass Effect: Andromeda\\' reviews are in: Bad game is bad\\n\\nBioWare confirmed the decision in an update on the Mass Effect website. The Andromeda corner of the game\\'s universe won\\'t be tossed, but continuing stories will be relegated to special multiplayer missions and other forms of media.\\n\\n\"Our last update, 1.10, was the final update for Mass Effect: Andromeda,\" the note reads. \"There are no planned future patches for single-player or in-game story content.\"'" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "txt1" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [], "source": [ "txt1 = obtener_texto(ruta + '/datos/reading/ejemplo_feed.txt')" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "from io import StringIO\n", "\n", "from pdfminer.converter import TextConverter\n", "from pdfminer.layout import LAParams\n", "from pdfminer.pdfdocument import PDFDocument\n", "from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter\n", "from pdfminer.pdfpage import PDFPage\n", "from pdfminer.pdfparser import PDFParser\n", "\n", "output_string = StringIO()\n", "with open(ruta +'/datos/reading/ejemplo-una-linea.pdf', 'rb') as in_file:\n", " parser = PDFParser(in_file)\n", " doc = PDFDocument(parser)\n", " rsrcmgr = PDFResourceManager()\n", " device = TextConverter(rsrcmgr, output_string, laparams=LAParams())\n", " interpreter = PDFPageInterpreter(rsrcmgr, device)\n", " for page in PDFPage.create_pages(doc):\n", " interpreter.process_page(page)\n", "\n", "txt2 = output_string.getvalue()" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Ejemplo de documento PDF. Contiene textos en negrita , en cursiva and texto subrayado\\n . \\nAdemás incluye una línea de texto con estilo Título\\n\\n \\n\\nTítulo del documento.\\n\\nÉste es el tercer párrafo'" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "txt2" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [], "source": [ "# Ejecutar sólo en linux/mac\n", "txt2 = txt2 [:-4]" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[[[['\\t\\tEjemplo de documento PDF.',\n", " '\\t\\tContiene textos en negrita , en cursiva and texto subrayado.',\n", " '\\t\\t',\n", " '\\t\\tAdemás incluye una línea de texto con estilo Título',\n", " 'Título del documento.',\n", " 'Éste es el FINAL del documento.']]]]" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "txt3 = docx2python(ruta + '/datos/reading/ejemplo-una-linea.docx')\n", "txt3.body" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [], "source": [ "txt3 = txt3.body[0][0][0][0]" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'\\t\\tEjemplo de documento PDF.'" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "txt3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Guardamos el resultado en nuestro equipo en una carpeta llamada _micorpus_." ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [], "source": [ "nuevocorpus = 'micorpus/'\n", "if not os.path.isdir(nuevocorpus): # existe nuevocorpus?\n", " os.mkdir(nuevocorpus)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Guardamos los 3 ficheros cargados previamente." ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[b'Five months.\\n\\nThat\\'s how long it\\'s been since Mass Effect: Andromeda launched, and that\\'s how long it took BioWare Montreal to admit that nothing more can be done with the ailing game\\'s story mode. Technically, it wasn\\'t even a full five months, as Andromeda launched on March 21.\\n\\nSEE ALSO: \\'Mass Effect: Andromeda\\' reviews are in: Bad game is bad\\n\\nBioWare confirmed the decision in an update on the Mass Effect website. The Andromeda corner of the game\\'s universe won\\'t be tossed, but continuing stories will be relegated to special multiplayer missions and other forms of media.\\n\\n\"Our last update, 1.10, was the final update for Mass Effect: Andromeda,\" the note reads. \"There are no planned future patches for single-player or in-game story content.\"',\n", " 'Ejemplo de documento PDF. Contiene textos en negrita , en cursiva and texto subrayado\\n . \\nAdemás incluye una línea de texto con estilo Título\\n\\n \\n\\nTítulo del documento.\\n\\nÉste es el tercer párrafo',\n", " '\\t\\tEjemplo de documento PDF.']" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "files = [txt1,txt2,txt3] # Generación array con objetos a usar en la iteración.\n", "files" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [], "source": [ "for idx, f in enumerate(files): \n", " with open(nuevocorpus+str(idx)+'.txt', 'w') as fileout:\n", " fileout.write(str(f))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generamos el corpus. Aquí se identifican internamente los párrafos, sentencias, palabras, ..." ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [], "source": [ "corpus = PlaintextCorpusReader (ruta +'/datos/reading/', '.*')" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['', '<', 'html', '>', '<', 'head', '>', '<', 'title', '>', 'Sample', 'Web', 'Page', '', '<', 'META', 'http', '-', 'equiv', '=\"', 'Content', '-', 'Type', '\"', 'content', '=\"', 'text', '/', 'html', ';', 'charset', '=', 'iso', '-', '8859', '-', '1', '\">', '', '<', 'body', 'bgcolor', '=\"#', 'ffffff', '\"', 'text', '=\"#', '000000', '\">', '<', 'h1', 'class', '=', \"'\", 'header', \"'>\", 'Main', 'heading', '', '<', 'p', '>', 'This', 'is', 'a', 'very', 'simple', 'HTML', 'document', '', '<', 'p', '>', 'Improve', 'your', 'image', 'by', 'including', 'an', 'image', '.'], ['', '<', 'img', 'src', '=\"', 'http', '://', 'www', '.', 'mygifs', '.', 'com', '/', 'CoverImage', '.', 'gif', '\"', 'alt', '=\"', 'A', 'Great', 'HTML', 'Resource', '\">', '<', 'p', 'class', \"='\", 'link', \"'>\", 'Add', 'a', 'link', 'to', 'your', 'favorite', '<', 'a', 'href', '=\"', 'http', '://', 'www', '.', 'dummies', '.', 'com', '/\">', 'Web', 'site', '.', '<', 'BR', '>', '<', 'B', '><', 'I', '>', 'This', 'is', 'a', 'new', 'sentence', 'without', 'a', 'paragraph', 'break', ',', 'in', 'bold', 'italics', '.', '', '']]]\n" ] } ], "source": [ "print(corpus.paras(corpus.fileids()[0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Tokenizar textos que no están en inglés." ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Bonjour M. Adam, comment allez-vous?', \"J'espère que tout va bien.\", \"Aujourd'hui est un bon jour.\"]\n" ] } ], "source": [ "from nltk.tokenize import sent_tokenize\n", "texto = \"Bonjour M. Adam, comment allez-vous? J'espère que tout va bien. Aujourd'hui est un bon jour.\"\n", "\n", "print(sent_tokenize(texto,\"french\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 7. Obtención de sinónimos (Wordnet)" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "enjoying or showing or marked by joy or pleasure\n", "['a happy smile', 'spent many happy days on the beach', 'a happy marriage']\n" ] } ], "source": [ "from nltk.corpus import wordnet\n", "\n", "syn = wordnet.synsets(\"happy\")\n", "print(syn[0].definition())\n", "print(syn[0].examples())" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "large Old World boas\n", "[]\n" ] } ], "source": [ "syn = wordnet.synsets(\"python\")\n", "print(syn[0].definition())\n", "print(syn[0].examples())" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['happy', 'felicitous', 'happy', 'glad', 'happy', 'happy', 'well-chosen']\n" ] } ], "source": [ "sinonimos = []\n", "\n", "for syn in wordnet.synsets('happy'):\n", " for lemma in syn.lemmas():\n", " sinonimos.append(lemma.name())\n", "\n", "print(sinonimos)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 8. Extracción de lemas usando Wordnet" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "increase\n" ] } ], "source": [ "from nltk.stem import WordNetLemmatizer\n", "\n", "lema = WordNetLemmatizer()\n", "print(lema.lemmatize('increases'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Puede suceder que una misma palabra sea un sustantivo o verbo en función del contexto. Podemos indicarle al lematizador, que nos devuelva el lema para una palabra que sea un verbo." ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "play\n" ] } ], "source": [ "print(lema.lemmatize('playing', pos=\"v\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "¿Y si le pedimos que nos devuelva los lemas en función de los diferentes contextos?..." ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "play\n", "playing\n", "playing\n" ] } ], "source": [ "print(lema.lemmatize('playing', pos=\"v\")) # verbos\n", "print(lema.lemmatize('playing', pos=\"n\")) # sustantivos\n", "print(lema.lemmatize('playing', pos=\"a\")) # adjetivos" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 9. Extracción de steams para otro idioma distinto al inglés." ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('arabic', 'danish', 'dutch', 'english', 'finnish', 'french', 'german', 'hungarian', 'italian', 'norwegian', 'porter', 'portuguese', 'romanian', 'russian', 'spanish', 'swedish')\n" ] } ], "source": [ "from nltk.stem import SnowballStemmer\n", "print(SnowballStemmer.languages)" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "palabra\n" ] } ], "source": [ "castellano = SnowballStemmer('english')\n", "print(castellano.stem(\"Palabra\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "No ha funcionado." ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "palabr\n" ] } ], "source": [ "castellano = SnowballStemmer('spanish')\n", "print(castellano.stem(\"Palabra\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 10. Extracción de steams (formas canónicas) y lemas" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "stone\n", "speak\n", "bedroom\n", "joke\n", "lisa\n", "purpl\n", "----------------------\n", "stone\n", "speaking\n", "bedroom\n", "joke\n", "lisa\n", "purple\n" ] } ], "source": [ "from nltk.stem import WordNetLemmatizer\n", "from nltk.stem import PorterStemmer\n", "\n", "steams = PorterStemmer()\n", "lemas = WordNetLemmatizer()\n", "\n", "print(steams.stem('stones'))\n", "print(steams.stem('speaking'))\n", "print(steams.stem('bedroom'))\n", "print(steams.stem('jokes'))\n", "print(steams.stem('lisa'))\n", "print(steams.stem('purple'))\n", "print('----------------------')\n", "print(lemas.lemmatize('stones'))\n", "print(lemas.lemmatize('speaking'))\n", "print(lemas.lemmatize('bedroom'))\n", "print(lemas.lemmatize('jokes'))\n", "print(lemas.lemmatize('lisa'))\n", "print(lemas.lemmatize('purple'))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 4 }