Haraball.no

Create two or more new columns from a function applied to a pandas dataframe

2016-12-21

def calculate_something(row):
    """A function that returns two values"""
    return row[0] + row[1], row[0] - row[2]

df['new_col1'], df['new_col2'] = zip(*df.apply(calculate_something, axis=1))

Find all rows and columns in a pandas DataFrame which fit a clause

2016-11-18

data = {
    'A': [3, 3, 5, 3],
    'B': [7, 2, 9, 2],
    'C': [3, 3, 3, 0],
    'D': [4, 4, 4, 1],
    'E': [9, 2, 5, 0]
}

df = pd.DataFrame(data)

print(df)

#    A  B  C  D  E
# 0  3  7  3  4  9
# 1  3  2  3  4  2
# 2  5  9  3  4  5
# 3  3  2  0  1  0

# We only want the cells with values more than 2
df > 2

#       A      B      C      D      E
# 0  True   True   True   True   True
# 1  True  False   True   True  False
# 2  True   True   True   True   True
# 3  True  False  False  False  False

Here there are True values for rows with index 0 and 2. How can get these be found?

Assign the clause to a new DataFrame:

df2 = df > 2

df2.all(axis=1)

# 0     True
# 1    False
# 2     True
# 3    False

With axis=1 we look at the rows.

Use axis=0 to look at the colums. Now the clause >2 is True for the first column (A):

df2.all(axis=0)

# A     True
# B    False
# C    False
# D    False
# E    False

Apply a function to every row in a Pandas DataFrame

2016-11-17

Create a DataFrame to run functions on:

import pandas as pd

data = {
    'one': [1, 1, 1], 
    'two': [2, 2, 2], 
    'three': [3, 3, 3]
}

df = pd.DataFrame(data)

print(df)

#    one  three  two
# 0    1      3    2
# 1    1      3    2
# 2    1      3    2

Define a function to apply to the column named three in the DataFrame:

def double_threes(row):
    return row['three'] * 2

Run the function on each row using apply. This assigns the result of the function to the three column.

df['three'] = df.apply(double_threes, axis=1)

print(df)

#    one  three  two
# 0    1      6    2
# 1    1      6    2
# 2    1      6    2

Add the results to a new column instead:

df['double_three'] = df.apply(double_threes, axis=1)

print(df)

#    one  three  two  double_three
# 0    1      3    2             6
# 1    1      3    2             6
# 2    1      3    2             6

Merge pdfs using terminal in MacOS

2016-11-04

Install poppler via brew:

brew install poppler

With poppler you can run pdfunite and merge pdfs with one command:

pdfunite banana.pdf apple.pdf result.pdf

Tip: Name the pdf files in the order you want them, using e.g. numbers:

pdfunite 01_apple.pdf 02_lemon.pdf 03_banana.pdf result.pdf

Cleaning used Docker containers with a cronjob

2016-10-19

Following this guide, I created a small crontab job that cleans out Docker containers with status "Exited" every 10 minutes to avoid having them fill up my disk.

I created a script in ~/cronjobs/clean_docker.sh which does:

#!/bin/bash
docker rm $(docker ps -q -f status=exited) >/dev/null 2>&1
docker rm $(docker ps -q -f status=created) >/dev/null 2>&1

Then I ran crontab -e in my terminal, and added the line

*/10 * * * *  ~/cronjobs/clean_docker.sh >> ~/cronjobs/logs/clean_docker.log 2>&1

to the file.

This will do the cleaning and log any output to a log file.

Query postgres database via ssh tunnel

2016-10-14

To connect to a postgres server via a ssh tunnel, first set up the tunnel:

ssh -L 63333:localhost:5432 harald@ip.ad.dr.ess

Then fire up the client pointing to the port you specified in the previous command:

psql -h localhost -p 63333 postgres

scp a folder to or from one server

2016-09-29

To transfer a local folder to a server, run:

scp -r folder/ user@<ip-address>:/destination/path/

The folder will then end up in /destination/path/folder with all its content.

To transfer a folder from a server to your local machine, run:

scp -r user@<ip-address>:/destination/path/folder /Users/username

The folder will now end up in /Users/username/folder.

If you have set up a configuration in .ssh/config, e.g.

Host servername
 HostName <ip-address>
 PreferredAuthentications publickey
 IdentityFile ~/.ssh/id_rsa

you can transfer files with

scp -r folder/ servername:/destination/path

Error: command 'clang' failed with exit status 1

2016-09-27

When installing skbio on MacOS for Python 3.5.1 I got the following error

...
    skbio/stats/__subsample.c:250:10: fatal error: 'numpy/arrayobject.h' file not found
    #include "numpy/arrayobject.h"
             ^
    1 error generated.
    error: command 'clang' failed with exit status 1

To fix this I run the following command:

export CFLAGS="-I /Users/username/projectname/.venv/lib/python3.5/site-packages/numpy/core/include $CFLAGS"

where .venv is the name of my virtual environment.

Passing arguments to docker run

2016-09-06

A quick example on how to pass arguments to a python script running in a Dockerfile.

The hello.py Python script:

#!/usr/bin/python
import sys
print("Hello %s" % sys.argv[1])

Dockerfile:

FROM python:2.7-slim
COPY . /src
CMD ["python", "/src/hello.py"]

Put those two files in the same folder:

$ ls -1
Dockerfile
hello.py

Build the docker image:

docker build -t pytest .

Run the script with an argument

$ docker run pytest python /src/hello.py world
Hello world

Note that the argument index in the script is 1, not 0.

Check the result of a linux command

2016-09-05

If you need to check the result of a linux command, you can append echo $? at the end of the line. The output is 0 for success and 1 for an error.

Ex:

You need to test if a folder exists:

$ test -d exists/; echo $?
0
$ test -d dontexist/; echo $?
1