Category Archives: Python Web API

Python(2) Pinterest API

目的

AI (機械学習)用の画像を取得しようと思ったら、意外に引っかかる。

PythonでPinterestの検索を行い、その結果の画像のURL等の情報を取得した。

Pinterest APIについては、検索ができるのは自分のアカウントのBoardとPinだけらしい。

Pinterest APIではすべてのPinを対象にしたい。

AI (機械学習)用の画像は、参考2のような、公開するデータセットを利用する方法もある。

コード

参考1のコードそのままだが

import os, sys, time
import requests
import json
import bs4 # beautifulSoupe4
import re  # for "findall"
 
# Save an image file
def save_image(file_name, image):
    with open(file_name, 'wb') as f:
        f.write(image)

def search(query, num_pins):
 
    # First access
    url     = 'https://www.pinterest.jp/search/pins/'
    headers = {
        'connection': 'keep-alive'
    }
 
    search_response = requests.get(url, params={'q':query}, headers=headers, stream=False)
    soup            = bs4.BeautifulSoup(search_response.text.replace('\n',''), 'html5lib')
 
    data_json_string = soup.find('script', type='application/json') # extract json string
    data_json        = json.loads(data_json_string.string) # convert into dictionary type variable
    results          = data_json['tree']['children'][0]['data']['results']
#    results          = data_json['resouceDataCache'][0]['children'][0]['data']['results']
 
    image_info_list  = []
    for r in results:
        image_info = {}
        image_info['description'] = r['description']
        image_info['link']        = r['link']
        image_info['image_url']   = r['images']['orig']['url']
        image_info['id']          = r['id']
        image_info_list.append(image_info)
 
 
    # Second or later access to load additional pins that are responded as a JSON string
    url             = 'https://www.pinterest.jp/resource/BaseSearchResource/get/'
    bookmarks       = data_json['resourceDataCache'][0]['resource']['options']['bookmarks']
    experiment_hash = data_json['context']['triggerable_experiments_hash']
    last_cookies    = search_response.cookies
 
    while len(image_info_list) < num_pins:
 
        ## Preparing parameters, headers and cookies for the "get" request
        params = {
            'source_url':'/search/pins/?q={}'.format(query),
            'data':json.dumps({
                'options':{
                    'bookmarks':bookmarks,
                    'query':query,
                    'scope':'pins',
                    'page_size':25,
                    'field_set_key':'unauth_react'
                },
                'context':{}}),
            '_':str(int(time.time())*10*10*10)
        }
 
        headers = {
            'Host':'www.pinterest.jp',
            'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0',
            'Accept-Language':'ja,en-US;q=0.7,en;q=0.3',
            'X-Pinterest-AppState': 'background',
            'X-Pinterest-ExperimentHash': experiment_hash,
            'X-NEW-APP':'1',
            'X-APP-VERSION':'9b11f84',
            'X-Requested-With':'XMLHttpRequest',
            'Referer':'https://www.pinterest.jp',
            'cookie':json.dumps({
                '_auth':dict(last_cookies)['_auth'],
                'csrftoken':dict(last_cookies)['csrftoken'],
                '_pinterest_sess':dict(last_cookies)['_pinterest_sess']}),
            'connection':'keep-alive'
        }
 
        cookies = {
            '_auth':dict(last_cookies)['_auth'],
            'csrftoken':dict(last_cookies)['csrftoken'],
            '_pinterest_sess':dict(last_cookies)['_pinterest_sess'],
            'bei':'False',
            'logged_out':'True',
            'fba':'True',
            'sessionFunelEventLogged':'1'
        }
 
        search_response = requests.get(url, cookies=cookies, params=params, headers=headers, stream=False)
        data_json       = json.loads(search_response.text)
        results         = data_json['resource_response']['data']['results']
 
        bookmarks       = data_json['resource']['options']['bookmarks']
        experiment_hash = data_json['client_context']['triggerable_experiments_hash']
        last_cookies    = search_response.cookies
 
        for r in results:
            image_info = {}
            image_info['description'] = r['description']
            image_info['link']        = r['link']
            image_info['image_url']   = r['images']['orig']['url']
            image_info['id']          = r['id']
            image_info_list.append(image_info)
 
    return image_info_list
 
 
def main(argv):
    keyword  = 'xxx' # keyword you want to search
    num_pins = 100 # Number of pins searched
    img_dir  = 'images'
    timeout = 10 # in second
    params  = {} # not used
    cookies = {} # not used
    headers = {} # not used
 
    image_info_list = search(keyword, num_pins)

    for img_info in image_info_list:
  img_url = img_info['image_url']
        # Retrieve the file name of the image
        name_search = re.findall(r'\/([a-zA-Z0-9:.=_-]*jpg|jpeg|JPG|JPEG)', img_url)
        img_name    = name_search[0]
 
        # Get the content of the image
        img_response = requests.get(img_url, timeout=timeout, params=params, cookies=cookies, headers=headers, stream=False)
        if img_response.raise_for_status() != None:
            sys.exit('HTTP Error When Accessing The Image File!') # if not suceessed, this script will be terminated
 
        # Save the image
        save_image('./'+img_dir+'/'+img_name, img_response.content)
 
 
if __name__ == '__main__':
    main(sys.argv)

search関数に検索ワードと取得したい画像数をいれると、結果はimagesに保存される!

参考

  1. http://hassiweb-programming.blogspot.com/2017/07/retrieve-pinterest-pins-by-python.html — PythonでPinterestのPin (画像)の検索結果を取得する
  2. https://ai.google/tools/datasets/

Python(1) Install MySQL by Homebrew

MySQL をインストール

ローカルの環境(Mac)に MySQL をインストール

ChenLab-MacBookAir-3:~ chen$ brew install mysql
==> Installing dependencies for mysql: openssl
==> Installing mysql dependency: openssl
==> Downloading https://homebrew.bintray.com/bottles/openssl-1.0.2o_2.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring openssl-1.0.2o_2.high_sierra.bottle.tar.gz
==> Caveats
A CA file has been bootstrapped using certificates from the SystemRoots
keychain. To add additional certificates (e.g. the certificates added in
the System keychain), place .pem files in
/usr/local/etc/openssl/certs

and run
/usr/local/opt/openssl/bin/c_rehash

This formula is keg-only, which means it was not symlinked into /usr/local,
because Apple has deprecated use of OpenSSL in favor of its own TLS and crypto libraries.

If you need to have this software first in your PATH run:
echo ‘export PATH=”/usr/local/opt/openssl/bin:$PATH”‘ >> ~/.bash_profile

For compilers to find this software you may need to set:
LDFLAGS: -L/usr/local/opt/openssl/lib
CPPFLAGS: -I/usr/local/opt/openssl/include

==> Summary
? /usr/local/Cellar/openssl/1.0.2o_2: 1,792 files, 12.3MB
==> Installing mysql
==> Downloading https://homebrew.bintray.com/bottles/mysql-8.0.11.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring mysql-8.0.11.high_sierra.bottle.tar.gz
==> /usr/local/Cellar/mysql/8.0.11/bin/mysqld –initialize-insecure –user=chen –basedir=/usr/local/Cell
==> Caveats
We’ve installed your MySQL database without a root password. To secure it run:
mysql_secure_installation

MySQL is configured to only allow connections from localhost by default

To connect run:
mysql -uroot

To have launchd start mysql now and restart at login:
brew services start mysql
Or, if you don’t want/need a background service you can just run:
mysql.server start
==> Summary
? /usr/local/Cellar/mysql/8.0.11: 254 files, 232.6MB
ChenLab-MacBookAir-3:~ chen$

インストールが終わったら、内容を見てみる

ChenLab-MacBookAir-3:~ chen$ brew info mysql
mysql: stable 8.0.11 (bottled)
Open source relational database management system
https://dev.mysql.com/doc/refman/8.0/en/
Conflicts with:
mariadb (because mysql, mariadb, and percona install the same binaries.)
mariadb-connector-c (because both install plugins)
mysql-cluster (because mysql, mariadb, and percona install the same binaries.)
mysql-connector-c (because both install MySQL client libraries)
percona-server (because mysql, mariadb, and percona install the same binaries.)
/usr/local/Cellar/mysql/8.0.11 (254 files, 232.6MB) *
Poured from bottle on 2018-06-25 at 13:33:16
From: https://github.com/Homebrew/homebrew-core/blob/master/Formula/mysql.rb
==> Dependencies
Build: cmake ✘
Required: openssl ✔
==> Requirements
Required: macOS >= 10.10 ✔
==> Options
–with-debug
Build with debug support
–with-embedded
Build the embedded server
–with-local-infile
Build with local infile loading support
–with-memcached
Build with InnoDB Memcached plugin
–with-test
Build with unit tests
==> Caveats
We’ve installed your MySQL database without a root password. To secure it run:
mysql_secure_installation

MySQL is configured to only allow connections from localhost by default

To connect run:
mysql -uroot

To have launchd start mysql now and restart at login:
brew services start mysql
Or, if you don’t want/need a background service you can just run:
mysql.server start
ChenLab-MacBookAir-3:~ chen$

MySQL動作確認

MySQLを起動して動作確認する。付いてにテスト用のデータベースを用意しておく。

ChenLab-MacBookAir-3:~ chen$ mysql.server start
Starting MySQL
.. SUCCESS!
ChenLab-MacBookAir-3:~ chen$ $ mysql -uroot
-bash: $: command not found
ChenLab-MacBookAir-3:~ chen$ mysql -uroot
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 8.0.11 Homebrew

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.

mysql> CREATE DATABASE mysqltest DEFAULT CHARACTER SET utf8mb4;
Query OK, 1 row affected (0.01 sec)

mysql> CREATE USER hoge@localhost IDENTIFIED BY ‘password’;
Query OK, 0 rows affected (0.10 sec)

mysql> GRANT ALL ON mysqltest.* TO hoge@localhost;
Query OK, 0 rows affected (0.01 sec)

mysql> \q
Bye
ChenLab-MacBookAir-3:~ chen$

MySQLに接続するパッケージ

Python3でMySQLに接続する方法についてですが、mysqlclinetを使う方法がオススメらしい。さっそくpipでインストールしましょう。

pip install mysqlclient

しかしインストールがエラーで終わらなかった。違う方法で挑戦。

ChenLab-MacBookAir-3:work-Python chen$ pip install mysql-connector-python-rf
Collecting mysql-connector-python-rf
Downloading https://files.pythonhosted.org/packages/21/79/2ff01ab7aa08db3a16b70b990c579c1024c6b2a734263cc7513a758867de/mysql-connector-python-rf-2.2.2.tar.gz (11.9MB)
100% |████████████████████████████████| 11.9MB 777kB/s
Building wheels for collected packages: mysql-connector-python-rf
Running setup.py bdist_wheel for mysql-connector-python-rf … done
Stored in directory: /Users/chen/Library/Caches/pip/wheels/87/58/fb/d95c84fad7e1bebfed324c13e107ebb08e1997c9226532859a
Successfully built mysql-connector-python-rf
Installing collected packages: mysql-connector-python-rf
Successfully installed mysql-connector-python-rf-2.2.2
ChenLab-MacBookAir-3:work-Python chen$

ChenLab-MacBookAir-3:work-Python chen$ pip list
Package Version
————————- ——-
mysql-connector-python-rf 2.2.2
nose 1.3.7
numpy 1.14.3
pip 10.0.1
setuptools 39.1.0
TBB 0.1
wheel 0.31.0
ChenLab-MacBookAir-3:work-Python chen$

PythonからMySQLに接続

Pythonのプログラムは、上記で作成したデーターベースに「booklist」というテーブルを作成し、そこにPythonという本のデーターを追加するという内容だ。

# -*- coding: utf-8 -*-
#import MySQLdb
#
#conn = MySQLdb.connect(db='mysqltest',user='hoge',passwd='password',charset='utf8mb4')
import mysql.connector

conn = mysql.connector.connect(user='root', password='', host='localhost', database='mysqltest')
c = conn.cursor()
#tableが既にある場合は一回削除します
c.execute('DROP TABLE IF EXISTS booklist')
#tableを作成します
c.execute('''
    CREATE TABLE booklist(
      id integer,
      name text,
      kakaku integer
    )
''')
#tableにデータを入れます
c.execute('INSERT INTO booklist VALUES(%s,%s,%s)',(1,'Python',2400))
conn.commit()
c.execute('SELECT * FROM booklist')
for row in c.fetchall():
    print(row)
conn.close()

出力も短い一行だけ。

ChenLab-MacBookAir-3:work-Python chen$ python mysql-test.py
(1, u’Python’, 2400)

 

参考

Mac へ MySQL を Homebrew でインストールする手順