{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "fmMXeKn1LMg6" }, "source": [ "# Introducción a Pandas\n", "\n", "Aprenderemos a utilizar pandas para el análisis de datos:\n", "\n", "* Introducción a Pandas\n", "* Series\n", "* DataFrames\n", "* Valores perdidos\n", "* GroupBy\n", "* Fusionado, Unión y Concatenación\n", "* Operaciones habituales\n", "* Entrada y salida de datos" ] }, { "cell_type": "markdown", "metadata": { "id": "jxO7Z9FkLMg8" }, "source": [ "___" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "y-LXx26pLMhA" }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "jug1VdGVLMhP" }, "source": [ "# Series" ] }, { "cell_type": "markdown", "metadata": { "id": "yVDWxB48LMhS" }, "source": [ "### Creando Series\n", "\n", "Conversión de una lista, Array Numpy o diccionario a Series:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 90 }, "executionInfo": { "elapsed": 3452, "status": "ok", "timestamp": 1603894629851, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "OX96ojT5LMhT", "outputId": "5d2d2c60-ba4e-41b4-95b6-c1069b3b4b9b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['a', 'b', 'c', 'd']\n", "[10, 20, 30]\n", "[10 20 30]\n", "{'a': 10, 'b': 20, 'c': 30}\n" ] } ], "source": [ "labels = ['a','b','c','d']\n", "my_list = [10,20,30]\n", "arr = np.array([10,20,30])\n", "d = {'a':10,'b':20,'c':30}\n", "\n", "print (labels)\n", "print (my_list)\n", "print (arr)\n", "print (d)" ] }, { "cell_type": "markdown", "metadata": { "id": "v2Sq9PwVLMhd" }, "source": [ "### Usando listas" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 90 }, "executionInfo": { "elapsed": 3442, "status": "ok", "timestamp": 1603894629856, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "DsMZIG03LMhe", "outputId": "16bdd2a6-e25c-4919-bff4-a699bfb7b5c7" }, "outputs": [ { "data": { "text/plain": [ "0 10\n", "1 20\n", "2 30\n", "dtype: int64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(data=my_list)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 90 }, "executionInfo": { "elapsed": 3440, "status": "ok", "timestamp": 1603894629859, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "EwDFqu8dLMhn", "outputId": "8af00a73-b31a-4f70-fea5-8e6685c41574" }, "outputs": [ { "data": { "text/plain": [ "a 10\n", "b 20\n", "c 30\n", "dtype: int64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Asignamos etiquetas a la serie\n", "pd.Series(data=my_list,index=labels)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 90 }, "executionInfo": { "elapsed": 3436, "status": "ok", "timestamp": 1603894629860, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "eoOx6XETLMh1", "outputId": "54da8f6f-2aed-4ff1-b873-3c934b4f467e" }, "outputs": [ { "data": { "text/plain": [ "a 10\n", "b 20\n", "c 30\n", "dtype: int64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(my_list,labels)" ] }, { "cell_type": "markdown", "metadata": { "id": "-wMWtO5dLMiB" }, "source": [ "### Usando Arrays" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 90 }, "executionInfo": { "elapsed": 3432, "status": "ok", "timestamp": 1603894629862, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "0VjQgPSCLMiD", "outputId": "19c6ecc9-3726-4c78-8c4d-2c040b462823" }, "outputs": [ { "data": { "text/plain": [ "a 10\n", "b 20\n", "c 30\n", "dtype: int64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(arr,labels)" ] }, { "cell_type": "markdown", "metadata": { "id": "zt0cKz6VLMiL" }, "source": [ "### Usando Diccionarios" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 90 }, "executionInfo": { "elapsed": 3427, "status": "ok", "timestamp": 1603894629862, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "yNdM9M7GLMiN", "outputId": "07086ed9-b180-4d8f-cd6a-d096257b984f" }, "outputs": [ { "data": { "text/plain": [ "a 10\n", "b 20\n", "c 30\n", "dtype: int64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.Series(d)" ] }, { "cell_type": "markdown", "metadata": { "id": "cjXe2F3QLMiW" }, "source": [ "### Usando índices\n", "\n", "La clave para usar Series, es entender sus índices. Pandas usa los índices en formato numéricos o texto. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 219 }, "executionInfo": { "elapsed": 3422, "status": "ok", "timestamp": 1603894629863, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "P9a_CLqULMiY", "outputId": "8d6df7c4-2c8e-4ba4-f4f1-7a54f52ab082" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "USA 1\n", "Germany 2\n", "USSR 3\n", "Japan 4\n", "dtype: int64\n", "\n", "USA 1\n", "Germany 2\n", "Italy 5\n", "Japan 4\n", "dtype: int64\n" ] } ], "source": [ "ser1 = pd.Series([1,2,3,4],index = ['USA', 'Germany','USSR', 'Japan'])\n", "ser2 = pd.Series([1,2,5,4],index = ['USA', 'Germany','Italy', 'Japan']) \n", "print (ser1)\n", "print ()\n", "print (ser2)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "executionInfo": { "elapsed": 3417, "status": "ok", "timestamp": 1603894629864, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "-n7cmXCDLMig", "outputId": "7fa28dd2-3598-447f-84b0-a4ebb254fd62" }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ser1['USA']" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 127 }, "executionInfo": { "elapsed": 3412, "status": "ok", "timestamp": 1603894629865, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "9WcRzJ6nLMip", "outputId": "6551d30d-c118-4363-9633-8bb05df605dd" }, "outputs": [ { "data": { "text/plain": [ "Germany 4.0\n", "Italy NaN\n", "Japan 8.0\n", "USA 2.0\n", "USSR NaN\n", "dtype: float64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Las operaciones se realizan en función del índice\n", "ser1 + ser2" ] }, { "cell_type": "markdown", "metadata": { "id": "NwwAYVLWLMi0" }, "source": [ "# DataFrames\n", "\n", "Los DataFrames están directamente inspirados del lenguaje de programación R. Podemos ver un DataFrame como un conjunto de objetos Series unidos." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "id": "Mz2gNWG0LMi1" }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "from numpy.random import randn\n", "np.random.seed(123)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 3405, "status": "ok", "timestamp": 1603894629866, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "LjB98a6xLMi-", "outputId": "63a06b6d-76fc-4d66-9102-ffd9005c429d" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZ
A-1.0856310.9973450.282978-1.506295
B-0.5786001.651437-2.426679-0.428913
C1.265936-0.866740-0.678886-0.094709
D1.491390-0.638902-0.443982-0.434351
E2.2059302.1867861.0040540.386186
\n", "
" ], "text/plain": [ " W X Y Z\n", "A -1.085631 0.997345 0.282978 -1.506295\n", "B -0.578600 1.651437 -2.426679 -0.428913\n", "C 1.265936 -0.866740 -0.678886 -0.094709\n", "D 1.491390 -0.638902 -0.443982 -0.434351\n", "E 2.205930 2.186786 1.004054 0.386186" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())\n", "df" ] }, { "cell_type": "markdown", "metadata": { "id": "VgvtElioLMjC" }, "source": [ "### Selección e indexación\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 127 }, "executionInfo": { "elapsed": 3400, "status": "ok", "timestamp": 1603894629867, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "VgAS8dqVLMjC", "outputId": "9303f891-e0b2-40cd-cf89-8056c5b30244" }, "outputs": [ { "data": { "text/plain": [ "A -1.085631\n", "B -0.578600\n", "C 1.265936\n", "D 1.491390\n", "E 2.205930\n", "Name: W, dtype: float64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['W']" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "executionInfo": { "elapsed": 3398, "status": "ok", "timestamp": 1603894629868, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "YXbNlYlVLMjH", "outputId": "b04b5ea5-0b01-4a59-8d77-55059b220505" }, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(df)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 3395, "status": "ok", "timestamp": 1603894629869, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "bTFIsnHILMjL", "outputId": "4ab719f6-69a1-465c-a2cd-c63e861eb40c" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WZ
A-1.085631-1.506295
B-0.578600-0.428913
C1.265936-0.094709
D1.491390-0.434351
E2.2059300.386186
\n", "
" ], "text/plain": [ " W Z\n", "A -1.085631 -1.506295\n", "B -0.578600 -0.428913\n", "C 1.265936 -0.094709\n", "D 1.491390 -0.434351\n", "E 2.205930 0.386186" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Selección de varias columnas por su nombre\n", "df[['W','Z']]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 127 }, "executionInfo": { "elapsed": 3393, "status": "ok", "timestamp": 1603894629871, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "u60zG6DwLMjQ", "outputId": "ea269a0f-0a3e-48e5-a013-6daa539fc2b1" }, "outputs": [ { "data": { "text/plain": [ "A -1.085631\n", "B -0.578600\n", "C 1.265936\n", "D 1.491390\n", "E 2.205930\n", "Name: W, dtype: float64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Pandas también permite sintaxis tipo SQL, no obstante no se recomienda su uso.\n", "df.W" ] }, { "cell_type": "markdown", "metadata": { "id": "xQNUv2ACLMjX" }, "source": [ "¿Qué tipo de dato hay en la columna W?" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "executionInfo": { "elapsed": 3389, "status": "ok", "timestamp": 1603894629872, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "6yxWoZkaLMjZ", "outputId": "10c00571-fba1-45b9-fcbd-bf866acc3b39" }, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(df['W'])" ] }, { "cell_type": "markdown", "metadata": { "id": "FCDFbLIMLMjg" }, "source": [ "Como vemos la columna W es simplemente una Serie" ] }, { "cell_type": "markdown", "metadata": { "id": "kcw8Fw5VLMjg" }, "source": [ "**Creando una nueva columna:**" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 3387, "status": "ok", "timestamp": 1603894629873, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "roORd3QTLMji", "outputId": "f7f732ae-b6f4-4758-ad71-17eee2d681ab" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZnueva
A-1.0856310.9973450.282978-1.506295-0.802652
B-0.5786001.651437-2.426679-0.428913-3.005279
C1.265936-0.866740-0.678886-0.0947090.587050
D1.491390-0.638902-0.443982-0.4343511.047408
E2.2059302.1867861.0040540.3861863.209984
\n", "
" ], "text/plain": [ " W X Y Z nueva\n", "A -1.085631 0.997345 0.282978 -1.506295 -0.802652\n", "B -0.578600 1.651437 -2.426679 -0.428913 -3.005279\n", "C 1.265936 -0.866740 -0.678886 -0.094709 0.587050\n", "D 1.491390 -0.638902 -0.443982 -0.434351 1.047408\n", "E 2.205930 2.186786 1.004054 0.386186 3.209984" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['nueva'] = df['W'] + df['Y']\n", "df" ] }, { "cell_type": "markdown", "metadata": { "id": "91QIiSP8LMjs" }, "source": [ "**Eliminar columnas**" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 3383, "status": "ok", "timestamp": 1603894629874, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "fJfMZPovLMjs", "outputId": "1170094d-e7dc-4982-e7f8-f7fa73921941" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZ
A-1.0856310.9973450.282978-1.506295
B-0.5786001.651437-2.426679-0.428913
C1.265936-0.866740-0.678886-0.094709
D1.491390-0.638902-0.443982-0.434351
E2.2059302.1867861.0040540.386186
\n", "
" ], "text/plain": [ " W X Y Z\n", "A -1.085631 0.997345 0.282978 -1.506295\n", "B -0.578600 1.651437 -2.426679 -0.428913\n", "C 1.265936 -0.866740 -0.678886 -0.094709\n", "D 1.491390 -0.638902 -0.443982 -0.434351\n", "E 2.205930 2.186786 1.004054 0.386186" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.drop('nueva',axis=1)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 3382, "status": "ok", "timestamp": 1603894629875, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "So7O_jwSLMjx", "outputId": "593ce53b-586f-4101-c810-9bb20375b338" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZnueva
A-1.0856310.9973450.282978-1.506295-0.802652
B-0.5786001.651437-2.426679-0.428913-3.005279
C1.265936-0.866740-0.678886-0.0947090.587050
D1.491390-0.638902-0.443982-0.4343511.047408
E2.2059302.1867861.0040540.3861863.209984
\n", "
" ], "text/plain": [ " W X Y Z nueva\n", "A -1.085631 0.997345 0.282978 -1.506295 -0.802652\n", "B -0.578600 1.651437 -2.426679 -0.428913 -3.005279\n", "C 1.265936 -0.866740 -0.678886 -0.094709 0.587050\n", "D 1.491390 -0.638902 -0.443982 -0.434351 1.047408\n", "E 2.205930 2.186786 1.004054 0.386186 3.209984" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# A menos que lo especifiquemos con inplace, no se elimina nada\n", "df" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 3378, "status": "ok", "timestamp": 1603894629876, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "Dr4_DWy7LMj0", "outputId": "3c11747b-c04b-432a-94e4-404c5375fb3c" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZ
A-1.0856310.9973450.282978-1.506295
B-0.5786001.651437-2.426679-0.428913
C1.265936-0.866740-0.678886-0.094709
D1.491390-0.638902-0.443982-0.434351
E2.2059302.1867861.0040540.386186
\n", "
" ], "text/plain": [ " W X Y Z\n", "A -1.085631 0.997345 0.282978 -1.506295\n", "B -0.578600 1.651437 -2.426679 -0.428913\n", "C 1.265936 -0.866740 -0.678886 -0.094709\n", "D 1.491390 -0.638902 -0.443982 -0.434351\n", "E 2.205930 2.186786 1.004054 0.386186" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.drop('nueva',axis=1, inplace=True)\n", "df" ] }, { "cell_type": "markdown", "metadata": { "id": "dH2fb0DlLMj5" }, "source": [ "**Eliminar filas**" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 175 }, "executionInfo": { "elapsed": 3375, "status": "ok", "timestamp": 1603894629878, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "3wqkZCqKLMj7", "outputId": "4853108d-8aa8-4ea7-a9cc-ad195004ee9a" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZ
A-1.0856310.9973450.282978-1.506295
B-0.5786001.651437-2.426679-0.428913
C1.265936-0.866740-0.678886-0.094709
D1.491390-0.638902-0.443982-0.434351
\n", "
" ], "text/plain": [ " W X Y Z\n", "A -1.085631 0.997345 0.282978 -1.506295\n", "B -0.578600 1.651437 -2.426679 -0.428913\n", "C 1.265936 -0.866740 -0.678886 -0.094709\n", "D 1.491390 -0.638902 -0.443982 -0.434351" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.drop('E',axis=0)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "executionInfo": { "elapsed": 3372, "status": "ok", "timestamp": 1603894629878, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "xQNv6njwLMj_", "outputId": "7f20700d-70d7-474a-8864-5b50e7fa0963" }, "outputs": [ { "data": { "text/plain": [ "(5, 4)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.shape " ] }, { "cell_type": "markdown", "metadata": { "id": "G1mJCtcxLMkG" }, "source": [ "**Selección de filas**" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 109 }, "executionInfo": { "elapsed": 3369, "status": "ok", "timestamp": 1603894629879, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "2oJUrPdpLMkG", "outputId": "ff7f0335-a3a1-40dc-fe36-f96eb5463548" }, "outputs": [ { "data": { "text/plain": [ "W -1.085631\n", "X 0.997345\n", "Y 0.282978\n", "Z -1.506295\n", "Name: A, dtype: float64" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc['A']" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 109 }, "executionInfo": { "elapsed": 3366, "status": "ok", "timestamp": 1603894629880, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "Tu_TgtP1LMkJ", "outputId": "f1661de4-fd04-4089-e2ba-da7adf7d7b6e" }, "outputs": [ { "data": { "text/plain": [ "W -1.085631\n", "X 0.997345\n", "Y 0.282978\n", "Z -1.506295\n", "Name: A, dtype: float64" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[0]" ] }, { "cell_type": "markdown", "metadata": { "id": "CjNa8f2jLMkL" }, "source": [ "**Selección de un subset de datos**" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "executionInfo": { "elapsed": 3361, "status": "ok", "timestamp": 1603894629880, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "ZLsW-sywLMkN", "outputId": "45b6f5ca-eb3b-4a48-9905-e9e72cee1ddf" }, "outputs": [ { "data": { "text/plain": [ "-2.426679243393074" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc['B','Y']" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 112 }, "executionInfo": { "elapsed": 3360, "status": "ok", "timestamp": 1603894629881, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "YQRLrdnoLMkP", "outputId": "eb3c2280-c772-44df-eee3-7adbfb90555e" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WY
A-1.0856310.282978
B-0.578600-2.426679
\n", "
" ], "text/plain": [ " W Y\n", "A -1.085631 0.282978\n", "B -0.578600 -2.426679" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[['A','B'],['W','Y']]" ] }, { "cell_type": "markdown", "metadata": { "id": "jrk8GYF_LMkS" }, "source": [ "### Selección condicional\n", "\n", "Una importante característica de Pandas es la selección condicional de manera muy similar a Numpy:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 3357, "status": "ok", "timestamp": 1603894629882, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "N_0ZAWCyLMkU", "outputId": "41697230-672d-41ac-e42a-8a42ad6f28b5" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZ
AFalseTrueTrueFalse
BFalseTrueFalseFalse
CTrueFalseFalseFalse
DTrueFalseFalseFalse
ETrueTrueTrueTrue
\n", "
" ], "text/plain": [ " W X Y Z\n", "A False True True False\n", "B False True False False\n", "C True False False False\n", "D True False False False\n", "E True True True True" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df>0" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 3355, "status": "ok", "timestamp": 1603894629883, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "FKx4az4nLMkW", "outputId": "2bcd7e54-15f9-45bd-c31e-d244d20be288" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZ
ANaN0.9973450.282978NaN
BNaN1.651437NaNNaN
C1.265936NaNNaNNaN
D1.491390NaNNaNNaN
E2.2059302.1867861.0040540.386186
\n", "
" ], "text/plain": [ " W X Y Z\n", "A NaN 0.997345 0.282978 NaN\n", "B NaN 1.651437 NaN NaN\n", "C 1.265936 NaN NaN NaN\n", "D 1.491390 NaN NaN NaN\n", "E 2.205930 2.186786 1.004054 0.386186" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df [df>0]" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 254 }, "executionInfo": { "elapsed": 3351, "status": "ok", "timestamp": 1603894629883, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "dG9Mz3s-LMkb", "outputId": "97c880d1-78ef-4cc6-b3d7-99913314e406" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A False\n", "B False\n", "C True\n", "D True\n", "E True\n", "Name: W, dtype: bool\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZ
C1.265936-0.866740-0.678886-0.094709
D1.491390-0.638902-0.443982-0.434351
E2.2059302.1867861.0040540.386186
\n", "
" ], "text/plain": [ " W X Y Z\n", "C 1.265936 -0.866740 -0.678886 -0.094709\n", "D 1.491390 -0.638902 -0.443982 -0.434351\n", "E 2.205930 2.186786 1.004054 0.386186" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print (df['W']>0)\n", "df[df['W']>0]" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 201 }, "executionInfo": { "elapsed": 3346, "status": "ok", "timestamp": 1603894629883, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "F3BKDc87LMke", "outputId": "ead21a13-f671-4039-d851-e125d21a160e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A False\n", "B False\n", "C True\n", "D True\n", "E True\n", "Name: W, dtype: bool\n" ] }, { "data": { "text/plain": [ "C -0.678886\n", "D -0.443982\n", "E 1.004054\n", "Name: Y, dtype: float64" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print (df['W']>0)\n", "df[df['W']>0]['Y']" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 143 }, "executionInfo": { "elapsed": 3342, "status": "ok", "timestamp": 1603894629884, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "otN-BaiSLMki", "outputId": "18074104-a1e4-4c2b-b4c5-086626db74d9" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YX
C-0.678886-0.866740
D-0.443982-0.638902
E1.0040542.186786
\n", "
" ], "text/plain": [ " Y X\n", "C -0.678886 -0.866740\n", "D -0.443982 -0.638902\n", "E 1.004054 2.186786" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['W']>0][['Y','X']]" ] }, { "cell_type": "markdown", "metadata": { "id": "FPffXb5nLMkl" }, "source": [ "Podemos concatenar condiciones con | y &. Deberemos encerrar entre paréntesis cada una de las condiciones:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 3339, "status": "ok", "timestamp": 1603894629884, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "11-SEKIiLMkm", "outputId": "3d3f0036-36f8-46f1-c36c-490dacc990b4" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZ
A-1.0856310.9973450.282978-1.506295
B-0.5786001.651437-2.426679-0.428913
C1.265936-0.866740-0.678886-0.094709
D1.491390-0.638902-0.443982-0.434351
E2.2059302.1867861.0040540.386186
\n", "
" ], "text/plain": [ " W X Y Z\n", "A -1.085631 0.997345 0.282978 -1.506295\n", "B -0.578600 1.651437 -2.426679 -0.428913\n", "C 1.265936 -0.866740 -0.678886 -0.094709\n", "D 1.491390 -0.638902 -0.443982 -0.434351\n", "E 2.205930 2.186786 1.004054 0.386186" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 81 }, "executionInfo": { "elapsed": 3337, "status": "ok", "timestamp": 1603894629886, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "aW1dWrUHLMkp", "outputId": "6f545660-f0ff-471e-b7d4-2bda3e8bc488" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZ
E2.205932.1867861.0040540.386186
\n", "
" ], "text/plain": [ " W X Y Z\n", "E 2.20593 2.186786 1.004054 0.386186" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['W']>0) & (df['Y'] > 1)]" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 143 }, "executionInfo": { "elapsed": 3334, "status": "ok", "timestamp": 1603894629887, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "kHpwycl7LMkq", "outputId": "f5e0799b-7426-47c4-8f46-85c74bee47ae" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZ
C1.265936-0.866740-0.678886-0.094709
D1.491390-0.638902-0.443982-0.434351
E2.2059302.1867861.0040540.386186
\n", "
" ], "text/plain": [ " W X Y Z\n", "C 1.265936 -0.866740 -0.678886 -0.094709\n", "D 1.491390 -0.638902 -0.443982 -0.434351\n", "E 2.205930 2.186786 1.004054 0.386186" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['W']>0) | (df['Y'] > 1)]" ] }, { "cell_type": "markdown", "metadata": { "id": "4RM6j700LMkt" }, "source": [ "### Más sobre índices" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 3333, "status": "ok", "timestamp": 1603894629889, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "6QOQBuHfLMkt", "outputId": "c2b5240f-41b8-4bfc-da36-ab5528285117" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZ
A-1.0856310.9973450.282978-1.506295
B-0.5786001.651437-2.426679-0.428913
C1.265936-0.866740-0.678886-0.094709
D1.491390-0.638902-0.443982-0.434351
E2.2059302.1867861.0040540.386186
\n", "
" ], "text/plain": [ " W X Y Z\n", "A -1.085631 0.997345 0.282978 -1.506295\n", "B -0.578600 1.651437 -2.426679 -0.428913\n", "C 1.265936 -0.866740 -0.678886 -0.094709\n", "D 1.491390 -0.638902 -0.443982 -0.434351\n", "E 2.205930 2.186786 1.004054 0.386186" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 3330, "status": "ok", "timestamp": 1603894629890, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "iLMj7J6DLMlB", "outputId": "577ffdb5-a693-40f6-c284-9ae293dc0052" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexWXYZ
0A-1.0856310.9973450.282978-1.506295
1B-0.5786001.651437-2.426679-0.428913
2C1.265936-0.866740-0.678886-0.094709
3D1.491390-0.638902-0.443982-0.434351
4E2.2059302.1867861.0040540.386186
\n", "
" ], "text/plain": [ " index W X Y Z\n", "0 A -1.085631 0.997345 0.282978 -1.506295\n", "1 B -0.578600 1.651437 -2.426679 -0.428913\n", "2 C 1.265936 -0.866740 -0.678886 -0.094709\n", "3 D 1.491390 -0.638902 -0.443982 -0.434351\n", "4 E 2.205930 2.186786 1.004054 0.386186" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Reseteamos el índice a una secuencia de 0 a n\n", "df.reset_index()" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "id": "fYlQRaBhLMlD" }, "outputs": [], "source": [ "nuevoindice = 'CA NY WY OR CO'.split()" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 3325, "status": "ok", "timestamp": 1603894629891, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "e_3CoAsgLMlE", "outputId": "f51bedc8-a0a1-4426-e6e5-b356bc53ad27" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZEstados
A-1.0856310.9973450.282978-1.506295CA
B-0.5786001.651437-2.426679-0.428913NY
C1.265936-0.866740-0.678886-0.094709WY
D1.491390-0.638902-0.443982-0.434351OR
E2.2059302.1867861.0040540.386186CO
\n", "
" ], "text/plain": [ " W X Y Z Estados\n", "A -1.085631 0.997345 0.282978 -1.506295 CA\n", "B -0.578600 1.651437 -2.426679 -0.428913 NY\n", "C 1.265936 -0.866740 -0.678886 -0.094709 WY\n", "D 1.491390 -0.638902 -0.443982 -0.434351 OR\n", "E 2.205930 2.186786 1.004054 0.386186 CO" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Estados'] = nuevoindice\n", "df" ] }, { "cell_type": "markdown", "metadata": { "id": "S605C4zILMlH" }, "source": [ "Utilizamos la columna Estados como índice en el dataset" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 238 }, "executionInfo": { "elapsed": 3321, "status": "ok", "timestamp": 1603894629891, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "hugNrpFILMlH", "outputId": "b10a2837-a684-4357-f4b9-e06c90e5a408" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZ
Estados
CA-1.0856310.9973450.282978-1.506295
NY-0.5786001.651437-2.426679-0.428913
WY1.265936-0.866740-0.678886-0.094709
OR1.491390-0.638902-0.443982-0.434351
CO2.2059302.1867861.0040540.386186
\n", "
" ], "text/plain": [ " W X Y Z\n", "Estados \n", "CA -1.085631 0.997345 0.282978 -1.506295\n", "NY -0.578600 1.651437 -2.426679 -0.428913\n", "WY 1.265936 -0.866740 -0.678886 -0.094709\n", "OR 1.491390 -0.638902 -0.443982 -0.434351\n", "CO 2.205930 2.186786 1.004054 0.386186" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.set_index('Estados')" ] }, { "cell_type": "markdown", "metadata": { "id": "JlbwCO30LMlJ" }, "source": [ "Tenemos que tener en cuenta que si no usamos el argumento inplace, no se aplican los cambios" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 3318, "status": "ok", "timestamp": 1603894629893, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "FX5xG8aPLMlK", "outputId": "13e6f94c-5447-4874-d307-77e06ab371db" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZEstados
A-1.0856310.9973450.282978-1.506295CA
B-0.5786001.651437-2.426679-0.428913NY
C1.265936-0.866740-0.678886-0.094709WY
D1.491390-0.638902-0.443982-0.434351OR
E2.2059302.1867861.0040540.386186CO
\n", "
" ], "text/plain": [ " W X Y Z Estados\n", "A -1.085631 0.997345 0.282978 -1.506295 CA\n", "B -0.578600 1.651437 -2.426679 -0.428913 NY\n", "C 1.265936 -0.866740 -0.678886 -0.094709 WY\n", "D 1.491390 -0.638902 -0.443982 -0.434351 OR\n", "E 2.205930 2.186786 1.004054 0.386186 CO" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 238 }, "executionInfo": { "elapsed": 3313, "status": "ok", "timestamp": 1603894629893, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "OD4_Z9ToLMlN", "outputId": "21ff5db8-0f96-46a2-c927-f43a9fa87613" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WXYZ
Estados
CA-1.0856310.9973450.282978-1.506295
NY-0.5786001.651437-2.426679-0.428913
WY1.265936-0.866740-0.678886-0.094709
OR1.491390-0.638902-0.443982-0.434351
CO2.2059302.1867861.0040540.386186
\n", "
" ], "text/plain": [ " W X Y Z\n", "Estados \n", "CA -1.085631 0.997345 0.282978 -1.506295\n", "NY -0.578600 1.651437 -2.426679 -0.428913\n", "WY 1.265936 -0.866740 -0.678886 -0.094709\n", "OR 1.491390 -0.638902 -0.443982 -0.434351\n", "CO 2.205930 2.186786 1.004054 0.386186" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.set_index('Estados', inplace=True)\n", "df" ] }, { "cell_type": "markdown", "metadata": { "id": "lYZ1-C6NLMlP" }, "source": [ "### Índices múltiples y jerarquía en los índices" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "id": "CBCL5422LMlQ" }, "outputs": [], "source": [ "# Creamos diferentes 'índices'\n", "outside = ['G1','G1','G1','G2','G2','G2']\n", "inside = [1,2,3,1,2,3]\n", "hier_index = list(zip(outside,inside))\n", "hier_index = pd.MultiIndex.from_tuples(hier_index)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 182 }, "executionInfo": { "elapsed": 3306, "status": "ok", "timestamp": 1603894629893, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "lVm5QYiuLMlR", "outputId": "b9c2af11-4b90-4d41-a560-6c93d9a7cbeb" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['G1', 'G1', 'G1', 'G2', 'G2', 'G2']\n", "[1, 2, 3, 1, 2, 3]\n", "MultiIndex([('G1', 1),\n", " ('G1', 2),\n", " ('G1', 3),\n", " ('G2', 1),\n", " ('G2', 2),\n", " ('G2', 3)],\n", " )\n" ] } ], "source": [ "print (outside)\n", "print (inside)\n", "print (hier_index)\n" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 238 }, "executionInfo": { "elapsed": 3304, "status": "ok", "timestamp": 1603894629895, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "FA7sWMjyLMlT", "outputId": "85e13e7e-345c-4105-b26e-802a1f31969e" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
G110.7373691.490732
2-0.9358341.175829
3-1.253881-0.637752
G210.907105-1.428681
2-0.140069-0.861755
3-0.255619-2.798589
\n", "
" ], "text/plain": [ " A B\n", "G1 1 0.737369 1.490732\n", " 2 -0.935834 1.175829\n", " 3 -1.253881 -0.637752\n", "G2 1 0.907105 -1.428681\n", " 2 -0.140069 -0.861755\n", " 3 -0.255619 -2.798589" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=['A','B'])\n", "df" ] }, { "cell_type": "markdown", "metadata": { "id": "6zDNnrA0LMlW" }, "source": [ "¿Cómo extraemos los datos en base a este índice doble?" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 143 }, "executionInfo": { "elapsed": 3301, "status": "ok", "timestamp": 1603894629895, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "FmSfJU-RLMlW", "outputId": "8a19bed9-d9ce-4ba3-98e4-727e13fcfaa0" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
10.7373691.490732
2-0.9358341.175829
3-1.253881-0.637752
\n", "
" ], "text/plain": [ " A B\n", "1 0.737369 1.490732\n", "2 -0.935834 1.175829\n", "3 -1.253881 -0.637752" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Haciendo uso de .loc\n", "df.loc['G1']" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 72 }, "executionInfo": { "elapsed": 3298, "status": "ok", "timestamp": 1603894629895, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "AGhUb6klLMla", "outputId": "4ae0a91c-4a4a-41c9-88a8-fc07d28f64de" }, "outputs": [ { "data": { "text/plain": [ "A 0.737369\n", "B 1.490732\n", "Name: 1, dtype: float64" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc['G1'].loc[1]" ] }, { "cell_type": "markdown", "metadata": { "id": "QbI7ISPQLMld" }, "source": [ "Podemos entender G1 y G2 como una columna extra que se usa para el filtrado.\n", "Además a los índices podemos asignarles nombres" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "executionInfo": { "elapsed": 3293, "status": "ok", "timestamp": 1603894629895, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "FCvzGuSlLMld", "outputId": "61135920-9222-4361-9780-9cfe55b4585a" }, "outputs": [ { "data": { "text/plain": [ "FrozenList([None, None])" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.index.names" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 269 }, "executionInfo": { "elapsed": 3290, "status": "ok", "timestamp": 1603894629896, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "0kKHeeKILMlg", "outputId": "6f4a2b97-80c1-4f27-ce94-d291175203ad" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
GrupoNúmero
G110.7373691.490732
2-0.9358341.175829
3-1.253881-0.637752
G210.907105-1.428681
2-0.140069-0.861755
3-0.255619-2.798589
\n", "
" ], "text/plain": [ " A B\n", "Grupo Número \n", "G1 1 0.737369 1.490732\n", " 2 -0.935834 1.175829\n", " 3 -1.253881 -0.637752\n", "G2 1 0.907105 -1.428681\n", " 2 -0.140069 -0.861755\n", " 3 -0.255619 -2.798589" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.index.names = ['Grupo','Número']\n", "df" ] }, { "cell_type": "markdown", "metadata": { "id": "Z6hbi_owLMlj" }, "source": [ "Supongamos que queremos obtener aquellos datos cuyo grupo es G1 y su número es 1" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 72 }, "executionInfo": { "elapsed": 3288, "status": "ok", "timestamp": 1603894629898, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "jWoKrjCbLMlk", "outputId": "8f09522a-9fc8-4c47-f194-b00986008b33" }, "outputs": [ { "data": { "text/plain": [ "A 0.737369\n", "B 1.490732\n", "Name: (G1, 1), dtype: float64" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.xs(['G1',1])" ] }, { "cell_type": "markdown", "metadata": { "id": "3lOsw1pcLMln" }, "source": [ "# Valores perdidos" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "id": "tkDYNTrTLMlo" }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 143 }, "executionInfo": { "elapsed": 3282, "status": "ok", "timestamp": 1603894629898, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "xWvNs5QILMlq", "outputId": "d71872e6-531e-4e9e-b026-c5c5a03993b5" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
01.05.01
12.0NaN2
2NaNNaN3
\n", "
" ], "text/plain": [ " A B C\n", "0 1.0 5.0 1\n", "1 2.0 NaN 2\n", "2 NaN NaN 3" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame({'A':[1,2,np.nan],\n", " 'B':[5,np.nan,np.nan],\n", " 'C':[1,2,3]})\n", "df" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 81 }, "executionInfo": { "elapsed": 3276, "status": "ok", "timestamp": 1603894629898, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "HNC5I3qOLMlr", "outputId": "6cd59f45-998a-46f0-8482-efa8ac634337" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
01.05.01
\n", "
" ], "text/plain": [ " A B C\n", "0 1.0 5.0 1" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dropna()" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 143 }, "executionInfo": { "elapsed": 3272, "status": "ok", "timestamp": 1603894629899, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "5oeJe9deLMlt", "outputId": "4055e06d-bf60-4754-b2e8-b94060503cff" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
C
01
12
23
\n", "
" ], "text/plain": [ " C\n", "0 1\n", "1 2\n", "2 3" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dropna(axis=1)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 112 }, "executionInfo": { "elapsed": 3268, "status": "ok", "timestamp": 1603894629900, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "vzCvIb7aLMlv", "outputId": "04aebab9-b671-4768-e443-51b39a210dde" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
01.05.01
12.0NaN2
\n", "
" ], "text/plain": [ " A B C\n", "0 1.0 5.0 1\n", "1 2.0 NaN 2" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dropna(thresh=2)" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 143 }, "executionInfo": { "elapsed": 3263, "status": "ok", "timestamp": 1603894629900, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "TEMJGymfLMlx", "outputId": "3442b068-a79b-4a03-f966-460c1911ea9b" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
01.05.01
12.0Valor Rellenado2
2Valor RellenadoValor Rellenado3
\n", "
" ], "text/plain": [ " A B C\n", "0 1.0 5.0 1\n", "1 2.0 Valor Rellenado 2\n", "2 Valor Rellenado Valor Rellenado 3" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.fillna(value='Valor Rellenado')" ] }, { "cell_type": "markdown", "metadata": { "id": "q3Mxh_kELMl0" }, "source": [ "Una caso más elaborado (y habitual), sería el de imputar la media de su columna a los NA" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 90 }, "executionInfo": { "elapsed": 3260, "status": "ok", "timestamp": 1603894629901, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "5tcMC_E5LMl0", "outputId": "510892a5-30fb-45a3-8667-c6485bd08dd6" }, "outputs": [ { "data": { "text/plain": [ "0 1.0\n", "1 2.0\n", "2 1.5\n", "Name: A, dtype: float64" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['A'].fillna(value=df['A'].mean())" ] }, { "cell_type": "markdown", "metadata": { "id": "oiIHNo-FLMl3" }, "source": [ "# Groupby\n", "El método groupby permite agrupar filas en base a un criterio y ejecutar operaciones de agregación sobre las mismas." ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "executionInfo": { "elapsed": 433, "status": "ok", "timestamp": 1603982809272, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "TJzahYPwLMl3" }, "outputs": [], "source": [ "import pandas as pd\n", "# Generación del dataframe\n", "data = {'Compañía':['GOOG','GOOG','MSFT','MSFT','FB','FB'],\n", " 'Trabajador':['Ana','Carlos','Rosa','Vanesa','Carlos','Sara'],\n", " 'Ventas':[200,120,340,124,243,350]}\n", "df = pd.DataFrame(data)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 238 }, "executionInfo": { "elapsed": 3253, "status": "ok", "timestamp": 1603894629902, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "XPhQulF3LMl5", "outputId": "8b547fac-a3bf-4521-9df7-65072a36e231" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CompañíaTrabajadorVentas
0GOOGAna200
1GOOGCarlos120
2MSFTRosa340
3MSFTVanesa124
4FBCarlos243
5FBSara350
\n", "
" ], "text/plain": [ " Compañía Trabajador Ventas\n", "0 GOOG Ana 200\n", "1 GOOG Carlos 120\n", "2 MSFT Rosa 340\n", "3 MSFT Vanesa 124\n", "4 FB Carlos 243\n", "5 FB Sara 350" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 607, "status": "ok", "timestamp": 1603982836820, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "JYfDNhX1LMl8", "outputId": "8482c386-6a38-4461-fc4b-c5601fdad612" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Agrupamos los datos en base a la columna Compañía\n", "df.groupby('Compañía')" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "executionInfo": { "elapsed": 749, "status": "ok", "timestamp": 1603982864658, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "aBVLkyUTLMl_" }, "outputs": [], "source": [ "# Guardamos el resultado en una variable\n", "grupo = df.groupby('Compañía')" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 175 }, "executionInfo": { "elapsed": 603, "status": "ok", "timestamp": 1603982866028, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "pv47XnLFLMmB", "outputId": "6b1ed976-d2b5-486f-f4ae-9c55dda8a83d" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Ventas
Compañía
FB593
GOOG320
MSFT464
\n", "
" ], "text/plain": [ " Ventas\n", "Compañía \n", "FB 593\n", "GOOG 320\n", "MSFT 464" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Ahora podemos aplicar funciones sobre la agrupación.\n", "grupo.sum()" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 175 }, "executionInfo": { "elapsed": 3240, "status": "ok", "timestamp": 1603894629906, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "kDxIpyVzLMmG", "outputId": "4e97facb-75d0-4127-e95b-31be4c53f414" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Ventas
Compañía
FB296.5
GOOG160.0
MSFT232.0
\n", "
" ], "text/plain": [ " Ventas\n", "Compañía \n", "FB 296.5\n", "GOOG 160.0\n", "MSFT 232.0" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Ejecutado todo de una vez\n", "df.groupby('Compañía').mean()" ] }, { "cell_type": "markdown", "metadata": { "id": "dyzrKcUMLMmI" }, "source": [ "Más ejemplos de agregaciones" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 175 }, "executionInfo": { "elapsed": 3235, "status": "ok", "timestamp": 1603894629906, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "5FKq54W7LMmJ", "outputId": "09f58148-2e8b-4a52-a346-602affe2d6d1" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TrabajadorVentas
Compañía
FB22
GOOG22
MSFT22
\n", "
" ], "text/plain": [ " Trabajador Ventas\n", "Compañía \n", "FB 2 2\n", "GOOG 2 2\n", "MSFT 2 2" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('Compañía').count()" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 175 }, "executionInfo": { "elapsed": 3231, "status": "ok", "timestamp": 1603894629907, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "rxc5VypaLMmN", "outputId": "7db9aff5-17ee-400c-e299-70028a04d647" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Ventas
Compañía
FB75.660426
GOOG56.568542
MSFT152.735065
\n", "
" ], "text/plain": [ " Ventas\n", "Compañía \n", "FB 75.660426\n", "GOOG 56.568542\n", "MSFT 152.735065" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('Compañía').std()" ] }, { "cell_type": "markdown", "metadata": { "id": "uPT9bKAgLMmR" }, "source": [ "Podemos hacer un 'describe' para ver las características de nuestra agrupación de datos" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 3226, "status": "ok", "timestamp": 1603894629907, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "a_TjBKG1LMmS", "outputId": "d6975849-0637-4f76-c573-8d65afea719f" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Ventas
countmeanstdmin25%50%75%max
Compañía
FB2.0296.575.660426243.0269.75296.5323.25350.0
GOOG2.0160.056.568542120.0140.00160.0180.00200.0
MSFT2.0232.0152.735065124.0178.00232.0286.00340.0
\n", "
" ], "text/plain": [ " Ventas \n", " count mean std min 25% 50% 75% max\n", "Compañía \n", "FB 2.0 296.5 75.660426 243.0 269.75 296.5 323.25 350.0\n", "GOOG 2.0 160.0 56.568542 120.0 140.00 160.0 180.00 200.0\n", "MSFT 2.0 232.0 152.735065 124.0 178.00 232.0 286.00 340.0" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grupo.describe()" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 300 }, "executionInfo": { "elapsed": 546, "status": "ok", "timestamp": 1603983114848, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "TkehSFQHLMmT", "outputId": "f290fde3-66e2-407f-b93b-816f59b57c1a" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CompañíaFBGOOGMSFT
Ventascount2.0000002.0000002.000000
mean296.500000160.000000232.000000
std75.66042656.568542152.735065
min243.000000120.000000124.000000
25%269.750000140.000000178.000000
50%296.500000160.000000232.000000
75%323.250000180.000000286.000000
max350.000000200.000000340.000000
\n", "
" ], "text/plain": [ "Compañía FB GOOG MSFT\n", "Ventas count 2.000000 2.000000 2.000000\n", " mean 296.500000 160.000000 232.000000\n", " std 75.660426 56.568542 152.735065\n", " min 243.000000 120.000000 124.000000\n", " 25% 269.750000 140.000000 178.000000\n", " 50% 296.500000 160.000000 232.000000\n", " 75% 323.250000 180.000000 286.000000\n", " max 350.000000 200.000000 340.000000" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Si no nos gusta como se muestra la información podemos usar el método transpose\n", "grupo.describe().transpose()" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 545, "status": "ok", "timestamp": 1603983200794, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "3kwu83ZmLMmV", "outputId": "4cd0c807-cacd-4119-dd39-cbe98141f91b" }, "outputs": [ { "data": { "text/plain": [ "Ventas count 2.000000\n", " mean 296.500000\n", " std 75.660426\n", " min 243.000000\n", " 25% 269.750000\n", " 50% 296.500000\n", " 75% 323.250000\n", " max 350.000000\n", "Name: FB, dtype: float64" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Descripción de la compañía FB\n", "grupo.describe().loc['FB']" ] }, { "cell_type": "markdown", "metadata": { "id": "wTlqxy01LMmX" }, "source": [ "# Fusionado, Unión y Concatenación" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "id": "nS4wXe5oLMmX" }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "executionInfo": { "elapsed": 543, "status": "ok", "timestamp": 1603983386286, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "xCKla5NdLMmZ" }, "outputs": [], "source": [ "# Generación de los sets de datos a utilizar\n", "df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],\n", " 'B': ['B0', 'B1', 'B2', 'B3'],\n", " 'C': ['C0', 'C1', 'C2', 'C3'],\n", " 'D': ['D0', 'D1', 'D2', 'D3']},\n", " index=[0, 1, 2, 3])\n", "\n", "df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],\n", " 'B': ['B4', 'B5', 'B6', 'B7'],\n", " 'C': ['C4', 'C5', 'C6', 'C7'],\n", " 'D': ['D4', 'D5', 'D6', 'D7']},\n", " index=[4, 5, 6, 7]) \n", "\n", "df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],\n", " 'B': ['B8', 'B9', 'B10', 'B11'],\n", " 'C': ['C8', 'C9', 'C10', 'C11'],\n", " 'D': ['D8', 'D9', 'D10', 'D11']},\n", " index=[8, 9, 10, 11])" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 593, "status": "ok", "timestamp": 1603983387756, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "rgrfPqpbLMmb", "outputId": "98a78cb5-fecb-453c-ded9-b7dc9682c30a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " A B C D\n", "0 A0 B0 C0 D0\n", "1 A1 B1 C1 D1\n", "2 A2 B2 C2 D2\n", "3 A3 B3 C3 D3\n", " A B C D\n", "4 A4 B4 C4 D4\n", "5 A5 B5 C5 D5\n", "6 A6 B6 C6 D6\n", "7 A7 B7 C7 D7\n", " A B C D\n", "8 A8 B8 C8 D8\n", "9 A9 B9 C9 D9\n", "10 A10 B10 C10 D10\n", "11 A11 B11 C11 D11\n" ] } ], "source": [ "print (df1)\n", "print (df2)\n", "print (df3)" ] }, { "cell_type": "markdown", "metadata": { "id": "9UmegRNOLMme" }, "source": [ "## Concatenacion\n", "\n", "La concatenación, básicamente une diferentes DataFrames. Hay que tener en cuenta que las dimensiones (respecto del eje usado en la concatenación) de los diferentes DataFrames, deben ser iguales." ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 426 }, "executionInfo": { "elapsed": 582, "status": "ok", "timestamp": 1603983389813, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "BulCP49TLMme", "outputId": "824b65c4-d7bb-43d9-e846-795b9951be57" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
0A0B0C0D0
1A1B1C1D1
2A2B2C2D2
3A3B3C3D3
4A4B4C4D4
5A5B5C5D5
6A6B6C6D6
7A7B7C7D7
8A8B8C8D8
9A9B9C9D9
10A10B10C10D10
11A11B11C11D11
\n", "
" ], "text/plain": [ " A B C D\n", "0 A0 B0 C0 D0\n", "1 A1 B1 C1 D1\n", "2 A2 B2 C2 D2\n", "3 A3 B3 C3 D3\n", "4 A4 B4 C4 D4\n", "5 A5 B5 C5 D5\n", "6 A6 B6 C6 D6\n", "7 A7 B7 C7 D7\n", "8 A8 B8 C8 D8\n", "9 A9 B9 C9 D9\n", "10 A10 B10 C10 D10\n", "11 A11 B11 C11 D11" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.concat([df1,df2,df3])" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3790, "status": "ok", "timestamp": 1603894630496, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "rdUueOuqLMmf", "outputId": "df782ba1-ba64-44ec-bc5a-76ef265234d1" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCDABCDABCD
0A0B0C0D0NaNNaNNaNNaNNaNNaNNaNNaN
1A1B1C1D1NaNNaNNaNNaNNaNNaNNaNNaN
2A2B2C2D2NaNNaNNaNNaNNaNNaNNaNNaN
3A3B3C3D3NaNNaNNaNNaNNaNNaNNaNNaN
4NaNNaNNaNNaNA4B4C4D4NaNNaNNaNNaN
5NaNNaNNaNNaNA5B5C5D5NaNNaNNaNNaN
6NaNNaNNaNNaNA6B6C6D6NaNNaNNaNNaN
7NaNNaNNaNNaNA7B7C7D7NaNNaNNaNNaN
8NaNNaNNaNNaNNaNNaNNaNNaNA8B8C8D8
9NaNNaNNaNNaNNaNNaNNaNNaNA9B9C9D9
10NaNNaNNaNNaNNaNNaNNaNNaNA10B10C10D10
11NaNNaNNaNNaNNaNNaNNaNNaNA11B11C11D11
\n", "
" ], "text/plain": [ " A B C D A B C D A B C D\n", "0 A0 B0 C0 D0 NaN NaN NaN NaN NaN NaN NaN NaN\n", "1 A1 B1 C1 D1 NaN NaN NaN NaN NaN NaN NaN NaN\n", "2 A2 B2 C2 D2 NaN NaN NaN NaN NaN NaN NaN NaN\n", "3 A3 B3 C3 D3 NaN NaN NaN NaN NaN NaN NaN NaN\n", "4 NaN NaN NaN NaN A4 B4 C4 D4 NaN NaN NaN NaN\n", "5 NaN NaN NaN NaN A5 B5 C5 D5 NaN NaN NaN NaN\n", "6 NaN NaN NaN NaN A6 B6 C6 D6 NaN NaN NaN NaN\n", "7 NaN NaN NaN NaN A7 B7 C7 D7 NaN NaN NaN NaN\n", "8 NaN NaN NaN NaN NaN NaN NaN NaN A8 B8 C8 D8\n", "9 NaN NaN NaN NaN NaN NaN NaN NaN A9 B9 C9 D9\n", "10 NaN NaN NaN NaN NaN NaN NaN NaN A10 B10 C10 D10\n", "11 NaN NaN NaN NaN NaN NaN NaN NaN A11 B11 C11 D11" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.concat([df1,df2,df3],axis=1)" ] }, { "cell_type": "markdown", "metadata": { "id": "k2KMny7nLMmh" }, "source": [ "## Fusionado\n", "\n", "Permite la unión de diferentes DataFrames usando una lógica similar a la SQL a la hora de fusionar tablas." ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "executionInfo": { "elapsed": 585, "status": "ok", "timestamp": 1603983441487, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "tPwg9EzNLMmh" }, "outputs": [], "source": [ "izquierda = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],\n", " 'A': ['A0', 'A1', 'A2', 'A3'],\n", " 'B': ['B0', 'B1', 'B2', 'B3']})\n", " \n", "derecha = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],\n", " 'C': ['C0', 'C1', 'C2', 'C3'],\n", " 'D': ['D0', 'D1', 'D2', 'D3']}) " ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3784, "status": "ok", "timestamp": 1603894630497, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "EpfI4Vo6LMml", "outputId": "75850b08-07d7-4580-ab2a-2a00034399c2" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
keyAB
0K0A0B0
1K1A1B1
2K2A2B2
3K3A3B3
\n", "
" ], "text/plain": [ " key A B\n", "0 K0 A0 B0\n", "1 K1 A1 B1\n", "2 K2 A2 B2\n", "3 K3 A3 B3" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "izquierda" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3770, "status": "ok", "timestamp": 1603894630497, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "t4HTEqnCLMmp", "outputId": "7ec4cf81-8d14-4f9e-f1d9-6aae893b307b" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
keyCD
0K0C0D0
1K1C1D1
2K2C2D2
3K3C3D3
\n", "
" ], "text/plain": [ " key C D\n", "0 K0 C0 D0\n", "1 K1 C1 D1\n", "2 K2 C2 D2\n", "3 K3 C3 D3" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "derecha" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3767, "status": "ok", "timestamp": 1603894630498, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "dThN71RYLMmr", "outputId": "df012e31-0c09-4a18-9cc9-2ae33a3ebea3" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
keyABCD
0K0A0B0C0D0
1K1A1B1C1D1
2K2A2B2C2D2
3K3A3B3C3D3
\n", "
" ], "text/plain": [ " key A B C D\n", "0 K0 A0 B0 C0 D0\n", "1 K1 A1 B1 C1 D1\n", "2 K2 A2 B2 C2 D2\n", "3 K3 A3 B3 C3 D3" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 4 diferentes tipos de fusionado: inner, left, rigth, outer\n", "pd.merge(izquierda,derecha,how='inner',on='key')" ] }, { "cell_type": "markdown", "metadata": { "id": "m1kd4TkcLMmu" }, "source": [ "Unos casos algo más complicados" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "executionInfo": { "elapsed": 1168, "status": "ok", "timestamp": 1603983697423, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "ZZiBQMWuLMmw" }, "outputs": [], "source": [ "izquierda = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],\n", " 'key2': ['K0', 'K1', 'K0', 'K1'],\n", " 'A': ['A0', 'A1', 'A2', 'A3'],\n", " 'B': ['B0', 'B1', 'B2', 'B3']})\n", " \n", "derecha = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],\n", " 'key2': ['K0', 'K0', 'K0', 'K0'],\n", " 'C': ['C0', 'C1', 'C2', 'C3'],\n", " 'D': ['D0', 'D1', 'D2', 'D3']})" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3763, "status": "ok", "timestamp": 1603894630500, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "ygkj4RcXLMmx", "outputId": "36d4e418-51eb-45a7-ab51-0c7ce4617949" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key1key2AB
0K0K0A0B0
1K0K1A1B1
2K1K0A2B2
3K2K1A3B3
\n", "
" ], "text/plain": [ " key1 key2 A B\n", "0 K0 K0 A0 B0\n", "1 K0 K1 A1 B1\n", "2 K1 K0 A2 B2\n", "3 K2 K1 A3 B3" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "izquierda" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3758, "status": "ok", "timestamp": 1603894630500, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "mDQ94qGHLMmz", "outputId": "6116f4a4-e795-49b7-e6c5-7a2acd611e41" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key1key2CD
0K0K0C0D0
1K1K0C1D1
2K1K0C2D2
3K2K0C3D3
\n", "
" ], "text/plain": [ " key1 key2 C D\n", "0 K0 K0 C0 D0\n", "1 K1 K0 C1 D1\n", "2 K1 K0 C2 D2\n", "3 K2 K0 C3 D3" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "derecha" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3756, "status": "ok", "timestamp": 1603894630501, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "87R15GZBLMm2", "outputId": "19953a53-558c-481c-faee-379dc294d8c1" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key1key2ABCD
0K0K0A0B0C0D0
1K1K0A2B2C1D1
2K1K0A2B2C2D2
\n", "
" ], "text/plain": [ " key1 key2 A B C D\n", "0 K0 K0 A0 B0 C0 D0\n", "1 K1 K0 A2 B2 C1 D1\n", "2 K1 K0 A2 B2 C2 D2" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Podemos fusionar las tablas en base a más de una key (columna)\n", "pd.merge(izquierda, derecha, on=['key1', 'key2'])" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3752, "status": "ok", "timestamp": 1603894630501, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "eG-zMbVwLMm4", "outputId": "2e43b951-3d3c-4e7e-81a7-1f60d82c4b77" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key1key2ABCD
0K0K0A0B0C0D0
1K0K1A1B1NaNNaN
2K1K0A2B2C1D1
3K1K0A2B2C2D2
4K2K1A3B3NaNNaN
5K2K0NaNNaNC3D3
\n", "
" ], "text/plain": [ " key1 key2 A B C D\n", "0 K0 K0 A0 B0 C0 D0\n", "1 K0 K1 A1 B1 NaN NaN\n", "2 K1 K0 A2 B2 C1 D1\n", "3 K1 K0 A2 B2 C2 D2\n", "4 K2 K1 A3 B3 NaN NaN\n", "5 K2 K0 NaN NaN C3 D3" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(izquierda, derecha, how='outer', on=['key1', 'key2'])" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3748, "status": "ok", "timestamp": 1603894630501, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "Uaz4LHcjLMm5", "outputId": "af1a5703-8914-448e-b15d-b32629f604a0" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key1key2ABCD
0K0K0A0B0C0D0
1K1K0A2B2C1D1
2K1K0A2B2C2D2
3K2K0NaNNaNC3D3
\n", "
" ], "text/plain": [ " key1 key2 A B C D\n", "0 K0 K0 A0 B0 C0 D0\n", "1 K1 K0 A2 B2 C1 D1\n", "2 K1 K0 A2 B2 C2 D2\n", "3 K2 K0 NaN NaN C3 D3" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(izquierda, derecha, how='right', on=['key1', 'key2'])" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3747, "status": "ok", "timestamp": 1603894630502, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "OGyBe1eFLMnI", "outputId": "617cf7ce-0451-4e6c-9da0-4469612c5de6" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key1key2ABCD
0K0K0A0B0C0D0
1K0K1A1B1NaNNaN
2K1K0A2B2C1D1
3K1K0A2B2C2D2
4K2K1A3B3NaNNaN
\n", "
" ], "text/plain": [ " key1 key2 A B C D\n", "0 K0 K0 A0 B0 C0 D0\n", "1 K0 K1 A1 B1 NaN NaN\n", "2 K1 K0 A2 B2 C1 D1\n", "3 K1 K0 A2 B2 C2 D2\n", "4 K2 K1 A3 B3 NaN NaN" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(izquierda, derecha, how='left', on=['key1', 'key2'])" ] }, { "cell_type": "markdown", "metadata": { "id": "hu620ExJLMnQ" }, "source": [ "## Unión\n", "De 2 DataFrames, con índices iguales o no, en uno sólo," ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 143 }, "executionInfo": { "elapsed": 1012, "status": "ok", "timestamp": 1603983753979, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "EVQahXzyLMnR", "outputId": "744cd738-bb53-49d2-e08b-ebacf3c2ca98" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
K0A0B0
K1A1B1
K2A2B2
\n", "
" ], "text/plain": [ " A B\n", "K0 A0 B0\n", "K1 A1 B1\n", "K2 A2 B2" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "izquierda = pd.DataFrame({'A': ['A0', 'A1', 'A2'],\n", " 'B': ['B0', 'B1', 'B2']},\n", " index=['K0', 'K1', 'K2']) \n", "\n", "derecha = pd.DataFrame({'C': ['C0', 'C2', 'C3'],\n", " 'D': ['D0', 'D2', 'D3']},\n", " index=['K0', 'K2', 'K3'])\n", "izquierda" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 143 }, "executionInfo": { "elapsed": 1018, "status": "ok", "timestamp": 1603983773943, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "H0DrCoeA4f1U", "outputId": "31bd5060-eca4-40a6-8782-081dc4984f73" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CD
K0C0D0
K2C2D2
K3C3D3
\n", "
" ], "text/plain": [ " C D\n", "K0 C0 D0\n", "K2 C2 D2\n", "K3 C3 D3" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "derecha" ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3740, "status": "ok", "timestamp": 1603894630503, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "IMYv5mffLMnS", "outputId": "96d767f6-c0c6-41ab-b138-5f0264045da5" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
K0A0B0C0D0
K1A1B1NaNNaN
K2A2B2C2D2
\n", "
" ], "text/plain": [ " A B C D\n", "K0 A0 B0 C0 D0\n", "K1 A1 B1 NaN NaN\n", "K2 A2 B2 C2 D2" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Observamos que el registro 2, al no existir en derecha, no se completa a nivel de columnas C y D\n", "izquierda.join(derecha)" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3737, "status": "ok", "timestamp": 1603894630504, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "SJf_lAsCLMnU", "outputId": "cbcdaf10-fe80-4f7f-f91f-d51355d84644" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
K0A0B0C0D0
K1A1B1NaNNaN
K2A2B2C2D2
K3NaNNaNC3D3
\n", "
" ], "text/plain": [ " A B C D\n", "K0 A0 B0 C0 D0\n", "K1 A1 B1 NaN NaN\n", "K2 A2 B2 C2 D2\n", "K3 NaN NaN C3 D3" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "izquierda.join(derecha, how='outer')" ] }, { "cell_type": "markdown", "metadata": { "id": "s9ZidTSnLMnV" }, "source": [ "# Operaciones habituales" ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 175 }, "executionInfo": { "elapsed": 562, "status": "ok", "timestamp": 1603983827177, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "ryJ2WdhwLMnW", "outputId": "6c90732f-29b4-4c8f-adf0-f6ae60d03f28" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col1col2col3
01444abc
12555def
23666ghi
34444xyz
\n", "
" ], "text/plain": [ " col1 col2 col3\n", "0 1 444 abc\n", "1 2 555 def\n", "2 3 666 ghi\n", "3 4 444 xyz" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.DataFrame({'col1':[1,2,3,4],'col2':[444,555,666,444],'col3':['abc','def','ghi','xyz']})\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "euAKyU3SLMnX" }, "source": [ "### Información sobre los valores únicos de una columna" ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 556, "status": "ok", "timestamp": 1603983834022, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "-9eCWd-iLMnY", "outputId": "5f345bd8-4af9-452d-fd63-46ff5180d440" }, "outputs": [ { "data": { "text/plain": [ "array([444, 555, 666])" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['col2'].unique()" ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 540, "status": "ok", "timestamp": 1603983834023, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "fzS6URDMLMna", "outputId": "0d06da78-4d05-4081-a907-d99c1207f9c5" }, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Total de elementos únicos \n", "df['col2'].nunique()" ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 530, "status": "ok", "timestamp": 1603983834026, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "GQ73QFNeLMnb", "outputId": "8e287df9-c3ac-4f78-9ff1-54476745caf7" }, "outputs": [ { "data": { "text/plain": [ "444 2\n", "666 1\n", "555 1\n", "Name: col2, dtype: int64" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['col2'].value_counts()" ] }, { "cell_type": "markdown", "metadata": { "id": "mpVXw_kMLMne" }, "source": [ "### Selección de datos" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 81 }, "executionInfo": { "elapsed": 557, "status": "ok", "timestamp": 1603983929441, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "D_DIv5atLMne", "outputId": "b1ca6b63-ecdc-4c78-c173-971210369ecf" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col1col2col3
34444xyz
\n", "
" ], "text/plain": [ " col1 col2 col3\n", "3 4 444 xyz" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Selección de un DataFrame filtrando en base a valores de columnas\n", "nuevodf = df[(df['col1']>2) & (df['col2']==444)]\n", "nuevodf" ] }, { "cell_type": "markdown", "metadata": { "id": "ti6npzTPLMnf" }, "source": [ "### Funciones Apply" ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "id": "B3stK5CaLMng" }, "outputs": [], "source": [ "def doble(x):\n", " return x**2" ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3716, "status": "ok", "timestamp": 1603894630509, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "fs-JEYffLMng", "outputId": "3521a5c2-d23f-4f5c-d7b5-c59d5ef66d12" }, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 4\n", "2 9\n", "3 16\n", "Name: col1, dtype: int64" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Aplicamos la función cuadrado A TODOS los elementos de col1. Es una operación columnar, por tanto no hace falta iterar registro a registro\n", "df['col1'].apply(doble)" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3713, "status": "ok", "timestamp": 1603894630509, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "CwoIwCYfLMnk", "outputId": "8c69a473-4dde-4bad-8d45-cb7312f5b4d4" }, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 4\n", "2 9\n", "3 16\n", "Name: col1, dtype: int64" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Mismo resultado de diferente manera\n", "df['col1'].apply(lambda x: x**2)" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3709, "status": "ok", "timestamp": 1603894630510, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "ixiEpRvtLMnm", "outputId": "b6fee2af-3653-4880-91aa-a78d99bfb00d" }, "outputs": [ { "data": { "text/plain": [ "0 3\n", "1 3\n", "2 3\n", "3 3\n", "Name: col3, dtype: int64" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Obtener el tamaño de las diferentes filas\n", "df['col3'].apply(len)" ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3704, "status": "ok", "timestamp": 1603894630510, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "W_Z2yRxHLMnn", "outputId": "1c4dbf1d-0181-464a-b489-e0994e60a74f" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col2col3
0444abc
1555def
2666ghi
3444xyz
\n", "
" ], "text/plain": [ " col2 col3\n", "0 444 abc\n", "1 555 def\n", "2 666 ghi\n", "3 444 xyz" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Eliminar columnas (ojo, hasta no usar inplace no se eliminan del set original)\n", "df.drop('col1',axis=1)" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3701, "status": "ok", "timestamp": 1603894630510, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "pQ7Kq58DLMnr", "outputId": "7afc6dd7-b17f-4955-ea5f-c14a8a629438" }, "outputs": [ { "data": { "text/plain": [ "Index(['col1', 'col2', 'col3'], dtype='object')" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 708, "status": "ok", "timestamp": 1603984313234, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "0Dp8IP3QLMnt", "outputId": "6f40f200-a46f-45ae-d2d1-b88765753084" }, "outputs": [ { "data": { "text/plain": [ "RangeIndex(start=0, stop=4, step=1)" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.index" ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 554, "status": "ok", "timestamp": 1603984362085, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "R6RxhYklLMnv", "outputId": "bf757741-0b6f-4d35-94fa-3cf1d75c0b1c" }, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['col1'].sum()" ] }, { "cell_type": "markdown", "metadata": { "id": "qfUZyqKVLMnw" }, "source": [ "**Eliminar permanentemente una columna**" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3690, "status": "ok", "timestamp": 1603894630512, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "3AgVeF0lLMnw", "outputId": "891abf3a-224d-4d68-cef9-19f85048c7cb" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col2col3
0444abc
1555def
2666ghi
3444xyz
\n", "
" ], "text/plain": [ " col2 col3\n", "0 444 abc\n", "1 555 def\n", "2 666 ghi\n", "3 444 xyz" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "del df['col1']\n", "df" ] }, { "cell_type": "markdown", "metadata": { "id": "BL0209SNLMn0" }, "source": [ "**Ordenar los DataFrames:**" ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 3687, "status": "ok", "timestamp": 1603894630512, "user": { "displayName": "Germán Alonso", "photoUrl": "", "userId": "00631987819198041764" }, "user_tz": -60 }, "id": "7_OMhmAZLMn2", "outputId": "52c3ac1b-586f-45c5-8a80-32c451704bea" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col2col3
0444abc
3444xyz
1555def
2666ghi
\n", "
" ], "text/plain": [ " col2 col3\n", "0 444 abc\n", "3 444 xyz\n", "1 555 def\n", "2 666 ghi" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values(by='col2') #inplace=False por defecto\n", "\n", "# Nota, observad como el índice no varía. Cada registro sigue manteniendo el índice original." ] } ], "metadata": { "colab": { "name": "07.Pandas.ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 4 }