Python登录并收获CSDN博客所有著作列表

分析登录过程

当即几龙啄磨百度登录和糊吧签到,这百度果然是互联网巨头,一个记名过程还为得复杂无比,简直有毒。我研商了少数上仍没有抓懂。所以要事先挑一个软柿子捏捏,就挑CSDN了。

经过很粗略,我哉无截图了。直接打开浏览器,然后打开Fiddler,然后登录CSDN。然后Fiddler显示浏览器为https://passport.csdn.net/account/login?ref=toolbar出殡了一个POST请求,这多少个请包含了登录表单,而且仍然未加密的。当然CSDN本身还是利用了HTTPS,所以安全性还行。

求求体如下,username和password当然是用户称与密码了。

username=XXXXX&password=XXXXXX&rememberMe=true&lt=LT-461600-wEKpWAqbfZoULXmFmDIulKPbL44hAu&execution=e4s1&_eventId=submit

lt参数我未亮是干啥的,结果一贯当页面被平等看原来皆以表单里头,这生直均了。CSDN很密切的连注释都吃来了。此外如你打开百度首页的话,还会发觉浏览器的log中尚会晤输出百度的招贤纳士音讯。

HTML截图


登录代码

这个信都发了,这样我们固然足以登录了。不说废话,直接上代码。先说说我遇见的几个坑。

先是是一个参数错误,其实逻辑没问题,可是代码我复制粘贴之后忘了改观名字了,就登录表单这里,三个参数均干成了lt,结果登录再次回到来的页面是错页面。我还看是无附带什么要求头,瞎整了大半龙。最终用Fiddler调试了累累所有才察觉。

其次单问题就是是CSDN鸡贼的跳转。由于浏览器自带了JS引擎,所以大家当浏览器被输入网址,到达页面就同样历程不肯定就是是一个请求。可能当中用了啊JS代码先跳反到中游页面,最后才过反至骨子里页面。代码里的_validate_redirect_url(self)函数就是涉及这的,登录了了第一潮呼吁会获取一个中等页面,它包含了同等堆放JS代码,其中有个重定向网址。我们获取到此重定向网址,还得要一不善,得到200OK事后,后续要才可以取实际页面。

其三独问题不怕是正则表明式匹配页面的空格问题了。获取著作首先得清楚著作总数,这个好惩治,直接得到页面里之篇章数就举办了。它相仿100条 共20页其一。那么该怎么抱呢?一起头我之所以之(\d+)条 共(\d+)页那正则,不过结果没有匹配到,然后自己仔细看了一晃页面,原来这半个词中不是一个空格,而是简单只空格!其实是问题可可以办,改一下正则(\d+)条\s*共(\d+)页就行了。于是随后假如碰着空格问题,直接用\s配合,不要想着好输入一个空格或少单空格。

import requests
from bs4 import BeautifulSoup
import re
import urllib.parse as parse


class CsdnHelper:
    """登录CSDN和列出所有文章的类"""
    csdn_login_url = 'https://passport.csdn.net/account/login?ref=toolbar'
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
    }
    blog_url = 'http://write.blog.csdn.net/postlist/'

    def __init__(self):
        self._session = requests.session()
        self._session.headers = CsdnHelper.headers

    def login(self, username, password):
        '''登录主函数'''
        form_data = self._prepare_login_form_data(username, password)
        response = self._session.post(CsdnHelper.csdn_login_url, data=form_data)
        if 'UserNick' in response.cookies:
            nick = response.cookies['UserNick']
            print(parse.unquote(nick))
        else:
            raise Exception('登录失败')

    def _prepare_login_form_data(self, username, password):
        '''从页面获取参数,准备提交表单'''
        response = self._session.get(CsdnHelper.csdn_login_url)
        login_page = BeautifulSoup(response.text, 'lxml')
        login_form = login_page.find('form', id='fm1')

        lt = login_form.find('input', attrs={'name': 'lt'})['value']
        execution = login_form.find('input', attrs={'name': 'execution'})['value']
        eventId = login_form.find('input', attrs={'name': '_eventId'})['value']
        form = {
            'username': username,
            'password': password,
            'lt': lt,
            'execution': execution,
            '_eventId': eventId
        }

        return form

    def _get_blog_count(self):
        '''获取文章数和页数'''
        self._validate_redirect_url()
        response = self._session.get(CsdnHelper.blog_url)
        blog_page = BeautifulSoup(response.text, 'lxml')
        span = blog_page.find('div', class_='page_nav').span
        print(span.string)
        pattern = re.compile(r'(\d+)条\s*共(\d+)页')
        result = pattern.findall(span.string)
        blog_count = int(result[0][0])
        page_count = int(result[0][1])
        return (blog_count, page_count)

    def _validate_redirect_url(self):
        '''验证重定向网页'''
        response = self._session.get(CsdnHelper.blog_url)
        redirect_url = re.findall(r'var redirect = "(\S+)";', response.text)[0]
        self._session.get(redirect_url)

    def print_blogs(self):
        '''输出文章信息'''
        blog_count, page_count = self._get_blog_count()
        for index in range(1, page_count + 1):
            url = f'http://write.blog.csdn.net/postlist/0/0/enabled/{index}'
            response = self._session.get(url)
            page = BeautifulSoup(response.text, 'lxml')
            links = page.find_all('a', href=re.compile(r'http://blog.csdn.net/u011054333/article/details/(\d+)'))
            print(f'----------第{index}页----------')
            for link in links:
                blog_name = link.string
                blog_url = link['href']
                print(f'文章名称:《{blog_name}》 文章链接:{blog_url}')


if __name__ == '__main__':
    csdn_helper = CsdnHelper()
    username = input("请输入用户名")
    password = input("请输入密码")
    csdn_helper.login(username, password)
    csdn_helper.print_blogs()

理所当然,这里最关键的之便是登录过程了。我们登录后,才可举办其他业务。比方说,下一样步仍可以写一个备份工具,把CSDN博客的具有随笔与图片下载到地面。有趣味之同校可以试一试。

很久在此以前写了一个百度自动发帖机,结果由于水贴太快导致账号永封。近日将事先的代码改了一晃,变成一个在命令行里面由python运行的百度贴吧客户端。

发帖

苟想发帖以来,可以输入 p 然后依据提醒,输入帖子的题目和内容。

准下图

发帖成功,如下图所示

于指令窗口里变你欣赏的颜色

添加颜色可假如你的微程序看起更为酣畅。
利用colorama termcolor登时有限只仓库可以轻松转移任意一段落string的示颜色。

比如说 colored(u"楼主" ,'red') 就可以用革命字体展现楼主二配。

长颜色

浏览旁人回复我之

输入指令 rm

rm

此是爬
恢复生机我的
的页面


撤回关注

输入dislike

dislike

看有帖子

预览效果

直回复

输入r
回复如今浏览的贴子。并且以公復苏的始末后自动抬高小尾巴。小尾巴可以遵照个人喜好好从定义内容。我好以署名档中添加图片与一些自由文字,这样可以回至重多更。此外你呢足以装签名档,在先后外post表栏目中’sign_id’:sign_id
更改。

准我以头里的帖子中回复: r

率先程序会指示是否用插入图片,假如要则直接按要求填写图片url,假设不需要,则一贯按回车跳了。

本土图片可以先行上流传图床然后打url,然后插入。 gif
图会有些限制,比如width小于530,size小于3MB

除此以外其他的常用命令

f 刷新首页的帖子
ft 刷新当前帖子的回升,比如您刚用r 回复了某一个帖子, 通过ft
刷新当前帖子,就登时看到自己刚恢复生机的始末
b 跳到首页的帖子列表

c 清屏

e 退出

exit


浏览自己关心的贴吧

输入指令 mf

mf

输入帖子序号

每个帖子还发对应的index,假诺对某个帖子感兴趣,可以一向输入
t index
以我本着第20 只帖子感兴趣,就可type t 20

t 20

关怀该贴吧

输入like

like

楼被楼内復苏

要你免牵挂过来楼主而是回复具体有楼层时,你可以直接输入 r floor_num
比如你想转头帖子的第2楼

直白输入 r 2, 2代表楼层数。

输入r 2

发帖成功后的职能图为

过来成

这里 r 2 中间打小个空格都不在乎,此外楼中楼时未亮小尾巴和图纸。

入贴吧

输入a 加上贴吧名来上

比如
先输入a 再输入 斗鱼TV

此地默认突显贴吧首页的帖子。但是如果您想浏览更早的帖子,输入 s
接着输入 页码
比如用 s 100 来浏览第100页的帖子。

此地的页数对应贴吧底部的页数。

此处index前面的数字要就此来读书帖子,
而括号内部的数字代表时该帖的死灰复燃数。
尽左边边也发帖楼主ID

签到

输入si

si

自我概括介绍一下用到情势与效用

找出 加入的 BSK 参数 8/29/2017 更新

找到了github上一个大神的repo
https://link.zhihu.com/?target=https%3A//github.com/8qwe24657913/Analyze\_baidu\_BSK

外现已于来了较详细的化解格局,我以此不得不膜拜。

这里我不怕说说怎么用python来运作他那么同样段落deobf.js
的javascript。

率先我尝试了好多 运行JS的Python库,比如说 js2py, selenium,pyv8,
可惜都拧。具体由我啊不是专程了解,有些js语句以控制台可以运作,放到pythonn里面就是相会报错。

因而那里自己所以了太保险的主意,用selenium调用firefox的webdriver去统计这段js。

继而用算出来的BSK带入Post中,就好成功发帖了。


浏览自己作了之主旨贴

输入 mt

mt

此地是爬
自家之贴子
的页面

回帖

自,回帖效率是必备的。

签到帐号

此地自己用了chrome一个特别导入导出cookie的extention

在登录了而的百度帐号后,随便进入一个百度帖子页面,然后该以这extention导出cookie(json
format)然后保留于cookie.txt文件中。

该次会自行读取该文件来促成登录效能。由于百度贴吧内验证流程进一步复杂,我其实做欠好什么直接输入用户称与密码还有验证码实现登录,所以这里用cookie文件实属取巧,不过对自个人来说已丰盛了。假如发生打探什么登陆的情侣,请不吝赐教。

依附代码,由于代码没啥复杂的物,所以暴发问题不怕留言哈

自之所以了pycurl 的仓库,重尽管为了应付自己局之NTLM验证。倘诺你公司没proxy
代理,则可拿下部几乎履行注释掉

self.c.setopt(pycurl.PROXY, 'http://192.168.87.15:8080')
self.c.setopt(pycurl.PROXYUSERPWD, 'LL66269:123456789')
self.c.setopt(pycurl.PROXYAUTH, pycurl.HTTPAUTH_NTLM)

新增图片预览效用

输入pic,可以生一个出于PyQt写的一个稍微窗口,用来预览该帖中的有所图片,并且会于console里面输出当前图所属的楼面,以及该楼层包含图表的数据。

在代码中,我形容了一个受Pic_Viewer的切近去实现。

拖欠窗口包括四独按钮:NF 表示产翻至下一个含有图表的大楼,PF
表示达成译至齐一个饱含图表的楼堂馆所,N 表示预览该楼内的生一样摆放图纸,P
表示预览该楼房内之齐同样摆设图片。


脚源码 (python 2.7)

拄的老三着库暴发:

pycurl, lxml,PIL,colorama,termcolor,selenium,PyQt4

主程序如下, 其余回帖和发帖功用还需要你的百度帐号cookie文件(json 格式)
和firefox的webdriver(geckodriver.exe)才可以落实。

# coding=utf-8

import time
import pycurl
import os.path
import sys, locale
import random
from random import randint
import urllib
from urllib import urlencode, quote_plus
from StringIO import StringIO
import json
from pprint import pprint
import re
import lxml.html
import codecs
from HTMLParser import HTMLParser
import unicodedata
from PIL import Image
from colorama import Fore, Back, Style,init
from termcolor import colored
from selenium import webdriver
from PyQt4.QtGui import *
from PyQt4 import QtGui



# class definition

class Pic_Viewer(QtGui.QWidget):


    def __init__(self,all_pic_list):
        super(Pic_Viewer, self).__init__()
        #self.url_list=['http://static.cnbetacdn.com/article/2017/0831/8eb7de909625140.png','http://static.cnbetacdn.com/article/2017/0831/7f11d5ec94fa123.png','http://static.cnbetacdn.com/article/2017/0831/1b6595175fb5486.jpg']

        self.url_dict=all_pic_list
        self.url_floor_num=self.url_dict.keys()
        self.url_floor_num.sort(key=lambda x: int(x),reverse=False)   # sort the key list  by floor number
        self.current_pic_floor=0
        self.current_pic_index=0

        self.initUI()
        #time.sleep(5)

    def initUI(self):
        QtGui.QToolTip.setFont(QtGui.QFont('Test', 10))
        self.setToolTip('This is a <b>QWidget</b> widget')


      # Show  image
        self.pic = QtGui.QLabel(self)
        self.pic.setGeometry(0, 0, 600, 500)
        #self.pic.setPixmap(QtGui.QPixmap("/home/lpp/Desktop/image1.png"))
        pixmap = QPixmap()
        data=self.retrieve_from_url(self.url_dict[self.url_floor_num[0]][0])
        pixmap.loadFromData(data)
        self.pic.setPixmap(pixmap)
        #self.pic.setPixmap(QtGui.QPixmap.loadFromData(data))


        # Show button 
        btn_next_same_floor = QtGui.QPushButton('N', self)
        btn_next_same_floor.setToolTip('This is a <b>QPushButton</b> widget')
        btn_next_same_floor.resize(btn_next_same_floor.sizeHint())
        btn_next_same_floor.clicked.connect(self.fun_next_pic_same_floor)
        btn_next_same_floor.move(400, 0)

        btn_prev_same_floor = QtGui.QPushButton('P', self)
        btn_prev_same_floor.setToolTip('This is a <b>QPushButton</b> widget')
        btn_prev_same_floor.resize(btn_prev_same_floor.sizeHint())
        btn_prev_same_floor.clicked.connect(self.fun_prev_pic_same_floor)
        btn_prev_same_floor.move(100, 0)


        btn_next_floor = QtGui.QPushButton('NF', self)
        btn_next_floor.setToolTip('This is a <b>QPushButton</b> widget')
        btn_next_floor.resize(btn_next_floor.sizeHint())
        btn_next_floor.clicked.connect(self.fun_next_floor)
        btn_next_floor.move(500, 0)

        btn_prev_floor = QtGui.QPushButton('PF', self)
        btn_prev_floor.setToolTip('This is a <b>QPushButton</b> widget')
        btn_prev_floor.resize(btn_prev_floor.sizeHint())
        btn_prev_floor.clicked.connect(self.fun_prev_floor)
        btn_prev_floor.move(0, 0)


        self.setGeometry(300, 300, 600, 500)
        self.setWindowTitle('ImgViewer')
        self.show()
        self.print_current_location()

    def retrieve_from_url(self,pic_url):
        c = pycurl.Curl()
        c.setopt(pycurl.PROXY, 'http://192.168.87.15:8080')
        c.setopt(pycurl.PROXYUSERPWD, 'LL66269:')
        c.setopt(pycurl.PROXYAUTH, pycurl.HTTPAUTH_NTLM)
        buffer = StringIO()
        c.setopt(pycurl.URL, pic_url)
        c.setopt(c.WRITEDATA, buffer)
        c.perform()
        c.close()  
        data = buffer.getvalue()
        return data  


    def print_current_location(self):
        sys.stdout.write('\r')
        sys.stdout.write("[ %sL ] %s (%d)" % (self.url_floor_num[self.current_pic_floor], str(self.current_pic_index+1),len(self.url_dict[self.url_floor_num[self.current_pic_floor]])))
        sys.stdout.flush()

    # Connect button to image updating 
    def fun_next_pic_same_floor(self):
        if len(self.url_dict[self.url_floor_num[self.current_pic_floor]])>1:
            if self.current_pic_index < len(self.url_dict[self.url_floor_num[self.current_pic_floor]])-1:
                self.current_pic_index=self.current_pic_index+1
            else:
                self.current_pic_index=0


            pixmap = QPixmap()
            data=self.retrieve_from_url(self.url_dict[self.url_floor_num[self.current_pic_floor]][self.current_pic_index])
            pixmap.loadFromData(data)
            self.pic.setPixmap(pixmap)
            #self.pic.setPixmap(QtGui.QPixmap( "/home/lpp/Desktop/image2.png"))
            self.print_current_location()



    def fun_prev_pic_same_floor(self):
        if len(self.url_dict[self.url_floor_num[self.current_pic_floor]])>1:
            if self.current_pic_index > 0:
                setoptelf.current_pic_index=self.current_pic_index-1
            else:
                self.current_pic_index=len(self.url_dict[self.url_floor_num[self.current_pic_floor]])-1

            pixmap = QPixmap()
            data=self.retrieve_from_url(self.url_dict[self.url_floor_num[self.current_pic_floor]][self.current_pic_index])
            pixmap.loadFromData(data)
            self.pic.setPixmap(pixmap)
            self.print_current_location()

    def fun_next_floor(self):
        if self.current_pic_floor < len(self.url_floor_num)-1:
            self.current_pic_floor=self.current_pic_floor+1
        else:
            self.current_pic_floor=0
        self.current_pic_index=0

        pixmap = QPixmap()
        data=self.retrieve_from_url(self.url_dict[self.url_floor_num[self.current_pic_floor]][self.current_pic_index])
        pixmap.loadFromData(data)
        self.pic.setPixmap(pixmap)
        self.print_current_location()



    def fun_prev_floor(self):
        if self.current_pic_floor > 0:
            self.current_pic_floor=self.current_pic_floor-1
        else:
            self.current_pic_floor=len(self.url_floor_num)-1
            self.current_pic_index=0

        pixmap = QPixmap()
        data=self.retrieve_from_url(self.url_dict[self.url_floor_num[self.current_pic_floor]][self.current_pic_index])
        pixmap.loadFromData(data)
        self.pic.setPixmap(pixmap)
        self.print_current_location()



#---------------------------------------------

# class definition

class Browser_tieba:
    mouse_pwd_fix="27,17,15,26,21,19,16,42,18,15,19,15,18,15,19,15,18,15,19,15,18,15,19,15,18,15,19,42,17,18,27,22,18,42,18,26,17,19,15,18,19,27,19,"
    zklz=False
    pid_floor_map={}
    tiebaName_utf=""
    tiebaName_url=""    
    tiezi_link=[]
    shouye_index=1
    shouye_titles=[]
    last_viewed_tiezi_index=0
    current_view_tiezi_link=""
    tail=""
    #u"[br][br][br]---来自百度贴吧Python客户端[br][br]"
    #[url]http://www.jianshu.com/p/11b085d326c2[/url]
    #[emotion pic_type=1 width=30 height=30]//tb2.bdstatic.com/tb/editor/i_f25.png?t=20140803[/emotion]
    c = pycurl.Curl()

    def __init__(self):

        self.Load_cookie()
        #self.read_source()
        Welcome=u"""

 _  _  _         _                                               _______  _         _              
| || || |       | |                               _             (_______)(_)       | |             
| || || |  ____ | |  ____   ___   ____    ____   | |_    ___     _        _   ____ | | _    ____   
| ||_|| | / _  )| | / ___) / _ \ |    \  / _  )  |  _)  / _ \   | |      | | / _  )| || \  / _  |  
| |___| |( (/ / | |( (___ | |_| || | | |( (/ /   | |__ | |_| |  | |_____ | |( (/ / | |_) )( ( | |  
 \______| \____)|_| \____) \___/ |_|_|_| \____)   \___) \___/    \______)|_| \____)|____/  \_||_|  

                                 _                             _  _                                
                           _    | |                           | |(_)               _               
 ___  ___    ____   _   _ | |_  | | _    ___   ____      ____ | | _   ____  ____  | |_             
(___)(___)  |  _ \ | | | ||  _) | || \  / _ \ |  _ \    / ___)| || | / _  )|  _ \ |  _)            
            | | | || |_| || |__ | | | || |_| || | | |  ( (___ | || |( (/ / | | | || |__            
            | ||_/  \__  | \___)|_| |_| \___/ |_| |_|   \____)|_||_| \____)|_| |_| \___)           
            |_|    (____/                                                                          


简书: 用python写一个百度贴吧客户端
http://www.jianshu.com/p/11b085d326c2
by bigtrace

        """
        print Welcome



    def solve_bsk(self,tbs_str):
        driver = webdriver.Firefox()
        bsk_js_1="""
    function bsk_solver(tbs_str) {
    var MAP = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/<$+-%>{:* \\,}[^=_~&](")';
    var IN=tbs_str;  // this is tbs

    var OUT={};

    function encodeStr(str) {
        var res = [];
        for (var i = 0; i < str.length; i++) {
            res.push(3 + (5 ^ str.charCodeAt(i)) ^ 6)
        }
        return res
    }

    function decodeCharCode(code) {
        return (6 ^ code) - 3 ^ 5
    }

    function toCharCodeArr(str) {
        var res = [];
        for (var i = 0; i < str.length; i++) {
            res.push(str.charCodeAt(i))
        }
        return res
    }

    function decodeChar(code) {
        return String.fromCharCode(decodeCharCode(code))
    }

    function decodeStr(arr) {
        return map(flatten(arr), decodeChar).join('')
    }

    function fromCharCodes(charCodes) {
        return String.fromCharCode.apply(null, charCodes)
    }

    function map(arr, func) {
        var res = [];
        for (var i = 0; i < arr.length; i++) {
            res.push(func(arr[i], i))
        }
        return res
    }

    function isArr(wtf) {
        return wtf.push && 0 === wtf.length || wtf.length
    }

    function flatten(arr) {
        return isArr(arr) ? [].concat.apply([], map(arr, flatten)) : arr
    }

    function genRes(arr, map) {
        for (var i = 0; i < arr.length; i++) {
            arr[i] = decodeCharCode(arr[i]);
            arr[i] = arr[i] ^ map[i % map.length]
        }
        return arr
    }

    function nextFunc(funcs) {
        var index = Math.floor(Math.random() * funcs.length);
        return funcs.splice(index, 1)[0]
    }





    function startRun() {
        var isNodejs = false;
        try {
            isNodejs = Boolean(global.process || global.Buffer)
        } catch (n) {
            isNodejs = false
        }
        if (isNodejs) {
            var wtf = decodeStr(toCharCodeArr(MAP)); // bug: quote isn't escaped
            func = function () {
                var [key, func] = nextFunc(funcs);
                return `"${key}":""${wtf}"` // bug: duplicate quotes
            }
        } else {
            func = function () {
                var [key, func] = nextFunc(funcs);
                try {
                    var res = func();
                    if (res && res.charCodeAt) {
                        res = res.replace(/"/g, encodeStr('\\"')); // bug: encoded twice
                        return `"${key}":"${res}"`;
                    } else return `"${key}": ${res.toString()}`
                } catch (n) {
                    return `"${key}": 20170511`
                }
            }
        }
        var length = funcs.length;
        var str = `{${Array.from({length}).map(func).join()}}`;
        console.log(str);
        if (!isNodejs) {
            var charCodes = genRes(encodeStr(str), [94, 126, 97, 99, 69, 49, 36, 43, 69, 117, 51, 95, 97, 76, 118, 48, 106, 103, 69, 87, 90, 37, 117, 55, 62, 77, 103, 38, 69, 53, 70, 80, 81, 48, 80, 111, 51, 73, 68, 125, 117, 51, 93, 87, 100, 45, 42, 105, 73, 40, 95, 52, 126, 80, 56, 71]);
            var data = btoa(fromCharCodes(charCodes));
            OUT.data = data
        } else OUT.data = btoa(fromCharCodes(str))
        //console.log(OUT);
    }
    var funcs = [['p1', function () {
        return window.encodeURIComponent(window.JSON.stringify(IN))
}], ['u1', function () {
        return "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
}], ['l1', function () {
        return "en-US"
}], ['s1', function () {
        return 1080
}], ['s2', function () {
        return 1920
}], ['w1', function () {
        return "NULL"
}], ['w2', function () {
        return "NULL"
}], ['a1', function () {
        return 1920
}], ['a2', function () {
        return 1040
}], ['s3', function () {
        return true
}], ['l2', function () {
        return true
}], ['i1', function () {
        return true
}], ['a3', function () {
        return false
}], ['p2', function () {
        return "Win32"
}], ['d1', function () {
        return "NULL"
}], ['c1', function () {
        return true
}], ['a4', function () {
        return false
}], ['p3', function () {
        return false
}], ['n1', function () {
        return 20170511
}], ['w3', function () {
        return false
}], ['e1', function () {
        return 20170511
}], ['n2', function () {
        return 20170511
}], ['n3', function () {
        return 20170511
}], ['r1', function () {
        return "function random() { [native code] }"
}], ['t1', function () {
        return "function toString() { [native code] }"
}], ['w4', function () {
        return "stop,open,alert,confirm,prompt,print,requestAnimationFrame,cancelAnimationFrame,requestIdleCallback,cancelIdleCallback,captureEvents,releaseEvents,getComputedStyle,matchMedia,moveTo,moveBy,resizeTo,resizeBy,getSelection,find,getMatchedCSSRules"
}], ['t2', function () {
        return Math.floor(Date.now() / 1000)
}], ['m1', function () {
        return 'basilisk_aLv0jg'
}]];
    startRun();

    return OUT.data
}


return bsk_solver('""" 

        bsk_js_full=bsk_js_1+tbs_str+"')"

        BSK_=driver.execute_script(bsk_js_full)
        driver.quit()
        return BSK_



    def read_source(self):
        fname_pic="pic_tail.txt"
        with open(fname_pic) as f:
            self.img_list = f.read().splitlines()
        f.close()
        fname_wenzi="wisdom_tail.txt"
        with codecs.open(fname_wenzi, "r", "utf-8") as f1:
            self.widsom_list = f1.read().splitlines()
        f1.close()



    def return_random_tail(self):
        tail_append=""

        # widsom in tail
        while True:
            list_widsom_index=randint(0,len(self.widsom_list)-1)
            data = "".join(self.widsom_list[list_widsom_index].split())
            if len(data)>80:
                break


        tail_append=tail_append+"[br]"+data+"[br]"




        #image in tail
        img_index=randint(0,len(self.img_list)-1)
        tail_append=tail_append+"[br]"+self.Get_size_of_url_img(self.img_list[img_index])+"[br]"


        # emoji in tail
        list_emoji=random.sample(xrange(1,70), 0)
        emoji_head="[emotion pic_type=1 width=30 height=30]https://gsp0.baidu.com/5aAHeD3nKhI2p27j8IqW0jdnxx1xbK/tb/editorlient/image_emoticon"
        emoji_tail=".png[/emotion] "

        for each_id in list_emoji:
            tail_append=tail_append+emoji_head+str(each_id)+emoji_tail

        return self.tail+tail_append

    def return_fix_tail(self):
        tail_append=""   #"[img pic_type=0 width=560 height=322]https://imgsa.baidu.com/forum/pic/item/ec4fa635e5dde711e2edc463adefce1b9d166111.jpg[/img]"


        return self.tail

    def return_tail(self):

        return self.return_fix_tail()




    def change_tieba(self):

        htmlparser = HTMLParser()
        self.tiebaName_utf = raw_input('Type tieba Name\n').decode(sys.stdin.encoding or locale.getpreferredencoding(True))   
        self.tiebaName_url=urllib.quote(self.tiebaName_utf.encode('utf-8')) # encode utf-8 as url
        self.shouye(1)


    def shouye(self,page):  


        print "************Shouye Layer "+ str(page) +"************"
        website = unicode('http://tieba.baidu.com/f?kw='+ self.tiebaName_url +'&ie=utf-8&pn=')
        link = website + unicode(str((page-1)*50))
        print "url="+link+"\n"
        buffer = StringIO()
        self.c.setopt(pycurl.URL, link)
        self.c.setopt(self.c.WRITEDATA, buffer)
        self.c.perform()
        body = buffer.getvalue().decode('utf-8', 'ignore')
        self.fid = re.search(r"forum = {\s+'id': (\d+),", body).group(1)
        self.tbs = re.search(r"tbs.*\"(.*)\"", body).group(1)
        self.name_url = re.search(r"'name_url':\s+\"(.*)\"", body).group(1)

        doc = lxml.html.fromstring(body)
        posts_datafields = doc.xpath("//li[contains(@class, ' j_thread_list')]/@data-field")
        links = doc.xpath("//a[@class='j_th_tit ']/@href")
        titles = doc.xpath("//a[@class='j_th_tit ']/@title")
        Header_list=[]



        self.header_max_width=12
        self.title_max_width=70
        i=0
        for each_title in titles: 
            each_floors_data_field_json = json.loads(posts_datafields[i])
            poster=each_floors_data_field_json['author_name']
            poster_str=urllib.unquote(poster)
            reply_num=each_floors_data_field_json['reply_num']
            Header="index "+colored(str(i),'magenta')
            Tail=colored("--"+poster_str+ " ("+ str(reply_num)+") ",'cyan')
            each_title=":   "+each_title
            Header_list.append([Header,each_title,Tail])

            Header_fmt= u'{0:<%s}' % (self.header_max_width - self.wide_chars(Header))
            Title_fmt= u'{0:<%s}' % (self.title_max_width - self.wide_chars(each_title))
            try:
                print (Header_fmt.format(Header)+Title_fmt.format(each_title)  +  Tail).encode("gb18030")
            except:
                print (Header_fmt.format(Header)+"Title can't be displayed").encode("gb18030")
            print ""
            i=i+1

        self.tiezi_link=links
        self.shouye_titles=Header_list
        print "\n---------------------"





    def representInt(self,s):

        try:
            int(s)
            return True
        except ValueError:
            return False

    def go_into_each_post(self,index):
        self.pid_floor_map={}
        self.author_floor_map={}
        self.content_floor_map={}
        self.current_thread_img_list={}

        if self.representInt(index):
            each_post_link='https://tieba.baidu.com'+self.tiezi_link[index]+"?pn=1"
        else:
            each_post_link=index+'?pn=1'  # for specific url


        self.current_view_tiezi_link=each_post_link

        buffer = StringIO()
        self.c.setopt(pycurl.URL, each_post_link)
        self.c.setopt(self.c.WRITEDATA, buffer)
        self.c.perform()
        body = buffer.getvalue().decode('utf-8', 'ignore')
        #----------------- get fid tbs tid for future reply purpose
        doc = lxml.html.fromstring(body)
        script = doc.xpath('//script[contains(., "PageData")]/text()')[0]
        data = re.search(r"{(.*)UTF", script).group(1)
        self.fid=re.search(r"fid:'(\d+)'", body).group(1)
        data="{"+data+"\"}"
        tbs_json = json.loads(data)
        self.tbs=tbs_json["tbs"]   # get tbs
        self.tid=re.search(r"p/(\d+)", self.current_view_tiezi_link).group(1)
        self.tiebaName_utf=re.search(r"forumName':\s+'(.*)',", body).group(1)



        #------------------


        title=doc.xpath("//*[self::h1 or self::h2 or self::h3][contains(@class, 'core_title_txt')]")
        try:
            print ("\n\n"+self.tiebaName_utf +" >> "+title[0].text_content()).encode("gb18030")
        except:
            print "************Tiezi : title can't be displayed************"
        #print each_post_link

        #--- get how many pages in total
        pager = doc.xpath("//li[@class='l_pager pager_theme_5 pb_list_pager']/a/@href")
        if pager:
            last_page=pager[-1]
            last_page_number = re.search(r"pn=(\d+)", last_page).group(1)
            page_list=range(1,int(last_page_number)+1)

            for each_page_num in page_list:
                #print each_page_num
                self.view_each_post_by_pages(index,each_page_num)
        else:
            self.view_each_post_by_pages(index,1)
        print "************no more replies************"



    def view_each_post_by_pages(self,index,page_number):
        if self.representInt(index):
            each_post_link='https://tieba.baidu.com'+self.tiezi_link[index]+"?pn="+str(page_number)
        else:
            each_post_link=index+"?pn="+str(page_number)  # for specific url


        buffer = StringIO()
        self.c.setopt(pycurl.URL, each_post_link)
        self.c.setopt(self.c.WRITEDATA, buffer)
        self.c.perform()
        body = buffer.getvalue().decode('utf-8', 'ignore')
        doc = lxml.html.fromstring(body)

        print "\n_____________   Page "+str(page_number)+": "+ colored(each_post_link,'blue') + "   _____________\n"

        p_postlist_All=doc.xpath("//div[@id='j_p_postlist']/div[contains(@class, 'l_post')]")
        p_postlist_datafield=doc.xpath("//div[@id='j_p_postlist']/div[contains(@class, 'l_post')]/@data-field")



        i=0

        max_width = 20  # Align chinese character:fix width : change this number to fit your screen
        for each_data_field in p_postlist_datafield:





            each_floors_data_field_json = json.loads(each_data_field)

            poster=each_floors_data_field_json['author']['user_name']
            post_no=each_floors_data_field_json['content']['post_no']
            post_id=each_floors_data_field_json['content']['post_id']
            post_comment_num=each_floors_data_field_json['content']['comment_num']




            if 'level_id' not in each_floors_data_field_json['author']:

                # This is for tieba old DOM structure 
                if not p_postlist_All[i].xpath(".//div[@class= 'd_badge_lv']"):
                    i=i+1
                    continue  # this floor is AD
                else:
                    poster_level=p_postlist_All[i].xpath(".//div[@class= 'd_badge_lv']")[0].text_content()


                post_content=p_postlist_All[i].xpath(".//div[contains(@class, 'd_post_content j_d_post_content')]")[0].text_content().lstrip()



                if len(p_postlist_All[i].xpath(".//span[@class= 'tail-info']"))>2:
                    post_open_type=p_postlist_All[i].xpath(".//span[@class= 'tail-info']")[0].text_content().lstrip()


                else:
                    post_open_type=""

                post_date=p_postlist_All[i].xpath(".//span[@class= 'tail-info']")[-1].text_content()






            else:

                # This is for tieba new DOM structure 

                poster_level=each_floors_data_field_json['author']['level_id'] # only in Chouxiang tieba has this option
                post_date=each_floors_data_field_json['content']['date']
                post_open_type=each_floors_data_field_json['content']['open_type']
                post_content=p_postlist_All[i].xpath(".//div[contains(@class, 'd_post_content ')]")[0].text_content().lstrip()   #@class= 'd_post_content j_d_post_content  clearfix'





            self.pid_floor_map[str(post_no)]=str(post_id)
            self.author_floor_map[str(post_no)]=poster


            if self.zklz==True:
                if poster != self.author_floor_map['1']:
                    i=i+1
                    continue           


            poster_str=colored(urllib.unquote(poster),'green')   # transform from urlencode string to utf-8 string
            if poster == self.author_floor_map['1']:
                poster_str=poster_str+ " <"+str(poster_level)+u"> "+ colored(u"楼主" ,'red')   # add poster level 
            else:
                poster_str=poster_str+ " <"+str(poster_level)+">"    # add poster level 


            post_if_img=p_postlist_All[i].xpath(".//div[contains(@class, 'd_post_content')]/img[@class='BDE_Image']/@src")
            post_if_video=p_postlist_All[i].xpath(".//embed/@data-video")

            if post_if_video:
                post_content=post_content+ colored("\n<video url: " + post_if_video[0] +" >",'yellow')

            if post_if_img:
                #print "img detected!"
                img_list=[]
                for each_img_src in post_if_img:
                    post_content=post_content+ colored("\n<img url: " + each_img_src +" >",'yellow')
                    img_list.append(each_img_src)
                #pprint(img_list)
                self.current_thread_img_list[str(post_no)]=img_list





            poster_fmt= u'{0:<%s}' % (max_width - self.wide_chars(poster_str))




            content=""
            try:
                if post_comment_num>0:
                    content=colored(str(post_no),'cyan')+ "L : "+ poster_fmt.format(poster_str) +" : "+ post_content + u"  回复("+colored(str(post_comment_num),'green')+")"
                else:
                    content=colored(str(post_no),'cyan')+ "L : "+ poster_fmt.format(poster_str) +" : "+ post_content

            except:
                content=colored(str(post_no),'cyan')+ "L : "+ poster_fmt.format(poster_str)+  " can't be displayed "




            print (content).encode("gb18030")
            self.content_floor_map[str(post_no)]=content

            if post_open_type=="apple":
                print "iphone" + " " + post_date
            elif post_open_type=="":
                print "PC" + " " + post_date
            elif post_open_type=="android":
                print post_open_type + " " + post_date
            else:
                print post_open_type + " " + post_date



            i=i+1
            print "-----\n"



    def Load_cookie(self):

        with open('cookie.txt') as data_file:    
            data = json.load(data_file)
        chunks = []
        for cookie_each_element in data:
            name, value,domain = cookie_each_element['name'], cookie_each_element['value'],cookie_each_element['domain']
            name = quote_plus(name)
            value = quote_plus(value)
            chunks.append('%s=%s;domain=%s;' % (name, value,domain))
        self.c.setopt(pycurl.PROXY, 'http://192.168.87.15:8080')
        self.c.setopt(pycurl.PROXYUSERPWD, 'LL66269:')
        self.c.setopt(pycurl.PROXYAUTH, pycurl.HTTPAUTH_NTLM)
        self.c.setopt(self.c.FOLLOWLOCATION, 1)
        self.c.setopt(pycurl.VERBOSE, 0)
        self.c.setopt(pycurl.FAILONERROR, True)
        self.c.setopt(pycurl.COOKIE, ''.join(chunks))
    #------------------- Need to use each post page's own cookie to login
        url_tbs = 'http://tieba.baidu.com/dc/common/tbs'
        buffer = StringIO()
        self.c.setopt(pycurl.URL, url_tbs)
        self.c.setopt(self.c.WRITEDATA, buffer)
        self.c.perform()
        body=buffer.getvalue()
        dic_json = json.loads(body)
        print "islogin="+ str(dic_json['is_login'])  # check if logged in


    def Reply_to_floor(self,floor_num):

        content = raw_input('you replied:\n').decode(sys.stdin.encoding or locale.getpreferredencoding(True))


        pid=self.pid_floor_map[str(floor_num)]

        kw=self.tiebaName_utf

        data_form = {
            'ie': 'utf-8',
            'kw': kw.encode("utf-8"),
            'fid': self.fid,
            'tid': self.tid,
            'floor_num':floor_num,
            'quote_id':pid,
            'rich_text':'1',
            'lp_type':'0',
            'lp_sub_type':'0',
            'tag':'11',
            'content': content.encode("utf-8"),
            'tbs': self.tbs,
            'basilisk':'1',
            'new_vcode':1,
            'repostid':pid,
            'anonymous':'0',
            '_BSK' : self.solve_bsk(self.tbs)
            }

        #pprint (data_form)

        buffer = StringIO()
        data_post = urllib.urlencode(data_form)
        url = 'https://tieba.baidu.com/f/commit/post/add'
        self.c.setopt(pycurl.URL, url)
        self.c.setopt(pycurl.POST, 1)
        self.c.setopt(pycurl.POSTFIELDS, data_post)
        self.c.setopt(self.c.WRITEFUNCTION, buffer.write)
        self.c.setopt(pycurl.VERBOSE, 0)
        self.c.perform()

        response = buffer.getvalue()   #here we got the response data
        response_json = json.loads(response)



        is_succeed=response_json["no"]   # get tbs
        if is_succeed==0:
            print "comment successfully!"
        else:
            pprint (response_json)
            print "comment failed!"


    def lzl_more(self,floor_num):

        print ("\n\n"+self.content_floor_map[str(floor_num)]+"\n").encode("gb18030")

        pid=self.pid_floor_map[str(floor_num)]
        lzl_more_url="https://tieba.baidu.com/p/comment?tid="+self.tid+"&pid="+ pid +"&pn=1"
        buffer = StringIO()
        self.c.setopt(pycurl.URL, lzl_more_url)
        self.c.setopt(self.c.WRITEDATA, buffer)
        self.c.perform()
        body = buffer.getvalue().decode('utf-8', 'ignore')
        doc = lxml.html.fromstring(body)

        lzl_single_post = doc.xpath('//li[contains(@class, "lzl_single_post")]')
        max_width=20

        for each_lzl in lzl_single_post:

            lzl_user=each_lzl.xpath('.//a[contains(@class, "at j_user_card ")]')[0].text
            lzl_content_main=each_lzl.xpath('.//span[contains(@class, "lzl_content_main")]')[0].text_content()
            poster_fmt= u'{0:<%s}' % (max_width - self.wide_chars(lzl_user))
            print (poster_fmt.format(lzl_user)+" : "+lzl_content_main).encode("gb18030")


        lzl_page = doc.xpath('//li[contains(@class, "lzl_li_pager")]/@data-field')[0]
        page_json = json.loads(lzl_page)
        total_page=page_json["total_page"]   # get lzl tatal page
        count=1

        #print "\n<total_page:"+str(total_page)+">"

        if total_page>1:

            while (count<total_page):

                count=count+1
                lzl_more_url="https://tieba.baidu.com/p/comment?tid="+self.tid+"&pid="+ pid +"&pn=" +str(count)
                #print lzl_more_url
                buffer = StringIO()
                self.c.setopt(pycurl.URL, lzl_more_url)
                self.c.setopt(self.c.WRITEDATA, buffer)
                self.c.perform()
                body = buffer.getvalue().decode('utf-8', 'ignore')
                doc = lxml.html.fromstring(body)
                lzl_single_post = doc.xpath('//li[contains(@class, "lzl_single_post")]')
                for each_lzl in lzl_single_post:
                    lzl_user=each_lzl.xpath('.//a[contains(@class, "at j_user_card ")]')[0].text
                    lzl_content_main=each_lzl.xpath('.//span[contains(@class, "lzl_content_main")]')[0].text_content()
                    poster_fmt= u'{0:<%s}' % (max_width - self.wide_chars(lzl_user))
                    print (poster_fmt.format(lzl_user)+" : "+lzl_content_main).encode("gb18030")


        print "------------------------------"




    def view_image(self):
        print "launch picture viewer..."
        viewer_app = QtGui.QApplication(sys.argv)
        ex = Pic_Viewer(self.current_thread_img_list)
        sys.exit(viewer_app.exec_())





    def Make_New_Post(self):

        title = raw_input('your title:\n').decode(sys.stdin.encoding or locale.getpreferredencoding(True))

        content = raw_input('your content:\n').decode(sys.stdin.encoding or locale.getpreferredencoding(True))
        url_img = raw_input('any img url to insert?\n').decode(sys.stdin.encoding or locale.getpreferredencoding(True))

        if url_img !="":
            url_img_upload=self.Get_size_of_url_img(url_img)
            content=url_img_upload+"[br]"+content+self.return_tail()
        else:
            content=content+self.return_tail()


        #----

        shouye_link=website = unicode('http://tieba.baidu.com/f?kw='+ self.tiebaName_url +'&ie=utf-8&pn=1')
        buffer = StringIO()
        self.c.setopt(pycurl.URL, shouye_link)
        self.c.setopt(self.c.WRITEDATA, buffer)
        self.c.perform()
        body=buffer.getvalue().decode('utf-8', 'ignore')
        tree = lxml.html.fromstring(body)
        fid=re.search(r"forum = {\s+'id'\:\s+(\d+)", body).group(1)

        tbs=re.search(r"tbs\': \"(\w+)\"", body).group(1)   # get tbs

        kw=self.tiebaName_utf
        mouse_pwd_t=str(int(time.time()))
        mouse_pwd=mouse_pwd_t+'0'
        mouse_pwd=self.mouse_pwd_fix+mouse_pwd

        # using signature

        signature=[{'id':15309379,'name':'西财'},{'id':43817160,'name':'早乙女1'},{'id':43817169,'name':'早乙女2'},{'id':24324097,'name':'ubw'},{'id':43817177,'name':'早乙女3'}]
        id_=randint(0,len(signature)-1)
        sign_id=signature[id_]['id']
        #

        data_form = {
            'ie': 'utf-8',
            'kw': kw.encode("utf-8"),
            'fid': fid,
            'tid': '0',
            'content': content.encode("utf-8"),
            'title':title.encode("utf-8"),
            'rich_text': '1',
            'tbs': tbs,
            'floor_num':'0',
            'sign_id':sign_id,
            'mouse_pwd':mouse_pwd,
            'mouse_pwd_t':mouse_pwd_t,
            '__type__': 'thread',
            'mouse_pwd_isclick':'0',
            '_BSK' : self.solve_bsk(tbs)
            }


        buffer = StringIO()
        data_post = urllib.urlencode(data_form)
        url = 'http://tieba.baidu.com/f/commit/thread/add'
        self.c.setopt(pycurl.URL, url)
        self.c.setopt(pycurl.POST, 1)
        self.c.setopt(pycurl.POSTFIELDS, data_post)
        self.c.setopt(self.c.WRITEFUNCTION, buffer.write)
        self.c.perform()

        response = buffer.getvalue()   #here we got the response data
        response_json = json.loads(response)
        is_succeed=response_json["no"]   # get tbs
        if is_succeed==0:
            print "post successfully!"
        else:
            print "post failed!"


    def Reply_this_post(self):
        url_img = raw_input('any img url to insert?\n').decode(sys.stdin.encoding or locale.getpreferredencoding(True))
        content = raw_input('you replied:\n').decode(sys.stdin.encoding or locale.getpreferredencoding(True))
        if url_img !="":
            url_img_upload=self.Get_size_of_url_img(url_img)
            content=content+"[br]"+url_img_upload+self.return_tail()
            #print content
        else:
            content=content+self.return_tail()

        kw=self.tiebaName_utf


        mouse_pwd_t=str(int(time.time()))
        mouse_pwd=mouse_pwd_t+'0'
        mouse_pwd=self.mouse_pwd_fix+mouse_pwd


        # using signature

        signature=[{'id':15309379,'name':'西财'},{'id':43817160,'name':'早乙女1'},{'id':43817169,'name':'早乙女2'},{'id':24324097,'name':'ubw'},{'id':43817177,'name':'早乙女3'}]
        id_=randint(0,len(signature)-1)
        sign_id=signature[id_]['id']

        data_form = {
            'ie': 'utf-8',
            'kw': kw.encode("utf-8"),
            'fid': self.fid,
            'tid': self.tid,
            'content': content.encode("utf-8"),
            'is_login': '1',
            'rich_text': '1',
            'tbs': self.tbs,
            'sign_id':sign_id,
            'mouse_pwd':mouse_pwd,
            'mouse_pwd_t':mouse_pwd_t,
            '__type__': 'reply',
            'mouse_pwd_isclick':'0',
            '_BSK' : self.solve_bsk(self.tbs)
            }

        #pprint (data_form)
        buffer = StringIO()
        data_post = urllib.urlencode(data_form)
        url = 'https://tieba.baidu.com/f/commit/post/add'
        self.c.setopt(pycurl.URL, url)
        self.c.setopt(pycurl.POST, 1)
        self.c.setopt(pycurl.POSTFIELDS, data_post)
        self.c.setopt(self.c.WRITEFUNCTION, buffer.write)
        self.c.setopt(pycurl.VERBOSE, 0)
        self.c.perform()

        response = buffer.getvalue()   #here we got the response data
        response_json = json.loads(response)
        is_succeed=response_json["no"]   # get tbs
        if is_succeed==0:
            print "comment successfully!"
        else:
            pprint (response_json)
            print "comment failed!"

    def Get_size_of_url_img(self,url_img):
        fp = open("img_upload", "wb")
        img_c = pycurl.Curl()
        USER_AGENT = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
        img_c.setopt(pycurl.PROXY, 'http://192.168.87.15:8080')
        img_c.setopt(pycurl.PROXYUSERPWD, 'LL66269:')
        img_c.setopt(pycurl.PROXYAUTH, pycurl.HTTPAUTH_NTLM)
        img_c.setopt(img_c.FOLLOWLOCATION, 1)
        img_c.setopt(pycurl.VERBOSE, 0)
        img_c.setopt(pycurl.FAILONERROR, True)
        img_c.setopt(pycurl.USERAGENT, USER_AGENT)
        img_c.setopt(pycurl.URL, url_img)
        img_c.setopt(pycurl.WRITEDATA, fp)
        img_c.perform()
        img_c.close()
        fp.close()
        im=Image.open("img_upload")
        width, height = im.size
        #if width>550:
        #    print "width is suggested to be less than 550"
        #print pos.stat("img_upload").st_size

        url_upload="[img pic_type=1 width="+ str(width) +" height="+str(height)+"]"+url_img+"[/img]"
        return(url_upload)

    def my_reply(self,page):
        print u"我回复的:\n"
        my_reply_link="http://tieba.baidu.com/i/i/my_reply?&pn="+str(page)

        buffer = StringIO()
        self.c.setopt(pycurl.URL, my_reply_link)
        self.c.setopt(self.c.WRITEDATA, buffer)
        self.c.perform()
        body = buffer.getvalue().decode('utf-8', 'ignore')

        doc = lxml.html.fromstring(body)

        reply_block = doc.xpath("//div[contains(@class, 'block t_forward clearfix ')]")
        for block in reply_block:
            if block.xpath(".//a[contains(@class, 'for_reply_context')]"):
                block_my_reply=block.xpath(".//a[contains(@class, 'for_reply_context')]")[0].text_content()
            else:
                block_my_reply=u"emoji or pure picture"
            block_common_source_main=block.xpath(".//div[contains(@class, 'common_source_main')]")[0]
            tiezi_url="http://tieba.baidu.com"+block_common_source_main.xpath("./a[1]/@href")[0]
            tiezi_title=block_common_source_main.xpath("./a[1]")[0].text


            tiezi_text=block_common_source_main.text_content()

            reply_num=re.search(r"(\(\d*\))", tiezi_text).group(1)

            block_tieba_name=block_common_source_main.xpath("./a[3]")[0].text






            print ("'"+block_my_reply+"'").encode("gb18030")
            print ("from:    "+tiezi_title + " "+reply_num + " -- "+block_tieba_name).encode("gb18030")
            print "url:    "+tiezi_url
            print """

            """

    def my_forum(self):
        #http://tieba.baidu.com/mo/q---995BABCDC4E864DFC079CE055F7D0C57%3AFG%3D1--0-1-0--2/m?tn=bdFBW&tab=favorite
        print u"我关注的贴吧:\n"
        my_forum_link="http://tieba.baidu.com/mo/q---995BABCDC4E864DFC079CE055F7D0C57%3AFG%3D1--1-3-0--2--wapp_1499966495430_639/m?tn=bdFBW&tab=favorite"
        buffer = StringIO()
        self.c.setopt(pycurl.URL, my_forum_link)
        self.c.setopt(self.c.WRITEDATA, buffer)
        self.c.perform()
        body = buffer.getvalue()#.decode('utf-8', 'ignore')
        doc = lxml.html.fromstring(body)
        pagination = doc.xpath("//tr")
        #print pagination
        max_width = 20
        for each_page in pagination:
            each_forum=each_page.xpath("./td[1]")[0].text_content()
            each_forum_level=each_page.xpath("./td[2]")[0].text_content()
            poster_fmt= u'{0:<%s}' % (max_width - self.wide_chars(each_forum))
            print  (poster_fmt.format(each_forum) +" <"+ each_forum_level  +">").encode("gb18030")



    def my_tie(self):
        print u"我的贴子:\n"
        my_forum_link="http://tieba.baidu.com/i/i/my_tie"
        buffer = StringIO()
        self.c.setopt(pycurl.URL, my_forum_link)
        self.c.setopt(self.c.WRITEDATA,buffer)
        self.c.perform()

        body=buffer.getvalue().decode('utf-8', 'ignore')
        doc=lxml.html.fromstring(body)
        tiezi_list=doc.xpath("//div[@class='simple_block_container']/ul/li")
        #http://tieba.baidu.com/p/5228903492?pid=109502281993
        for each_tiezi in tiezi_list:
            tiezi_text=each_tiezi.text_content()
            tiezi_link="http://tieba.baidu.com"+each_tiezi.xpath(".//a[@class='thread_title']/@href")[0]

            print (tiezi_text).encode("gb18030")
            print tiezi_link
            print "---\n\n"

    def onekeySignin(self):
        #'tbs': '2b506030c2989d171500408206'
        #my_forum_link="https://tieba.baidu.com/index.html"
        #file_out = codecs.open("mao_out.txt", "w", "utf-8")
        #buffer = StringIO()
        #self.c.setopt(pycurl.URL, my_forum_link)
        #self.c.setopt(self.c.WRITEDATA,buffer)
        #self.c.perform()

        #body=buffer.getvalue().decode('utf-8', 'ignore')
        #file_out.write(body)
        #tbs=re.search(r"PageData\.tbs.*\"(.*)\"", body).group(1)

        data_form = {
            'ie': 'utf-8',
            'kw':self.tiebaName_utf.encode("utf-8"),
            'tbs': self.tbs,

            }


        buffer = StringIO()
        data_post = urllib.urlencode(data_form)
        url = 'https://tieba.baidu.com/sign/add'
        self.c.setopt(pycurl.URL, url)
        self.c.setopt(pycurl.POST, 1)
        self.c.setopt(pycurl.POSTFIELDS, data_post)
        self.c.setopt(self.c.WRITEDATA, buffer)
        self.c.perform()

        response = buffer.getvalue()   #here we got the response data
        response_json = json.loads(response)

        is_succeed=response_json["error"]  
        if is_succeed=="":
            print u"签到成功!"

        else:
            print (is_succeed).encode("gb18030")

    def like(self):



        data_form = {
            'fid': self.fid ,
            'ie': 'gbk',
            'fname':self.tiebaName_utf.encode("utf-8"),
            'uid' :self.name_url,
            'tbs': self.tbs,

            }


        buffer = StringIO()
        data_post = urllib.urlencode(data_form)

        url = 'http://tieba.baidu.com/f/like/commit/add'
        self.c.setopt(pycurl.URL, url)
        self.c.setopt(pycurl.POST, 1)
        self.c.setopt(pycurl.POSTFIELDS, data_post)
        self.c.setopt(self.c.WRITEDATA, buffer)
        self.c.perform()

        response = buffer.getvalue()   #here we got the response data
        response_json = json.loads(response)

        is_succeed=response_json["error"]  
        level_name=response_json["level_name"]  
        #pprint (response_json)
        if is_succeed=="":
            print u"已关注"
            print (u"本吧头衔: "+level_name).encode("gb18030")


        else:
            print u"关注失败"





    def dislike(self):

        data_form = {
            'fid': self.fid ,
            'ie': 'gbk',
            'fname':self.tiebaName_utf.encode("utf-8"),
            'uid' :self.name_url,
            'tbs': self.tbs,

            }


        buffer = StringIO()
        data_post = urllib.urlencode(data_form)

        url = 'http://tieba.baidu.com/f/like/commit/delete'
        self.c.setopt(pycurl.URL, url)
        self.c.setopt(pycurl.POST, 1)
        self.c.setopt(pycurl.POSTFIELDS, data_post)
        self.c.setopt(self.c.WRITEDATA, buffer)
        self.c.perform()
        response = buffer.getvalue()   #here we got the response data
        response_json = json.loads(response)

        #pprint (response_json)
        is_succeed=response_json["data"]["ret"]["is_done"]  
        if is_succeed==True:
            print u"已取消关注"

        else:
            print u"取消关注失败,或者未关注该吧"




    def replyme(self):
        print u"回复我的:\n"

        replyme_link="http://tieba.baidu.com/i/i/replyme"
        buffer = StringIO()
        self.c.setopt(pycurl.URL, replyme_link)
        self.c.setopt(self.c.WRITEDATA, buffer)
        self.c.perform()
        body = buffer.getvalue().decode('utf-8', 'ignore')
        doc = lxml.html.fromstring(body)
        reply_list = doc.xpath("//div[@id='feed']/ul/li")

        max_width = 20
        for each_reply in reply_list:
            replyme_user=each_reply.xpath(".//div[@class='replyme_user']")[0].text_content()
            replyme_content=each_reply.xpath(".//div[@class='replyme_content']")[0].text_content()
            replyme_url=each_reply.xpath(".//div[@class='replyme_content']/a/@href")[0]
            replyme_url="http://tieba.baidu.com"+replyme_url
            feed_from=each_reply.xpath(".//div[@class='feed_from']")[0].text_content()


            poster_fmt= u'{0:<%s}' % (max_width - self.wide_chars(replyme_user))
            print  (poster_fmt.format(replyme_user) +" replied:  "+ replyme_content  +"   \n\n     -- " +  feed_from.lstrip().rstrip() +"" ).encode("gb18030")
            print "url:    "+replyme_url
            print """

            """



    def Get_Back_To_shouye(self):
        print "************Shouye Layer************"
        i=0

        for Header,each_title,Tail in self.shouye_titles: 
            Header_fmt= u'{0:<%s}' % (self.header_max_width - self.wide_chars(Header))
            title_fmt= u'{0:<%s}' % (self.title_max_width - self.wide_chars(each_title))
            try:
                print (Header_fmt.format(Header)+title_fmt.format(each_title)  +  Tail).encode("gb18030")
            except:
                print (Header_fmt.format(Header)+"Title can't be displayed").encode("gb18030")
            i=i+1
            print ""
        print "\n---------------------"

    def wide_chars(self,s):
        #return the extra width for wide characters
        if isinstance(s, str):
            s = s.decode('utf-8')
        return sum(unicodedata.east_asian_width(x) in ('F', 'W') for x in s)

    def encode_utf_html(self,input_unicode):
        res = []
        for b in html:
            o = ord(b)
            if o > 255:
                res.append('&#{};'.format(o))
            else:
                res.append(b)

        res_string = ''.join(res)
        return(res_string)

    def Refresh_tiezi(self):
        if self.last_viewed_tiezi_index>0:
            self.go_into_each_post(self.last_viewed_tiezi_index)   
        else:
            print "you haven't viewed any tiezi yet"


    def Refresh_shouye(self):
        self.shouye(1)     
    def exit(self):
        self.c.close()


# main function


app=Browser_tieba()

while True:
    print """


    """
    nb = raw_input('Give me your command (or type help to see your options): \n')
    try:
        if nb.startswith( 's ' )==True:
            sp=re.search(r"s\s+(\d+)", nb).group(1)
            sp=int(sp)
            if sp>=1:
                app.shouye(sp)

        elif nb.startswith( 'help' )==True:
            help="----- Help for different command -----\n" 
            a="a -Begin to surf around tieba with its name (First step !)\n"
            s="s -go to specific pages; How to use: (s 10)\n"
            t="t -go to specific tiezi; How to use: (t 12) or (t https://tieba.baidu.com/p/4803063434)\n" 
            pic="pic - launch image viewer to browse all the picutures in current thread\n"
            p="p - make a new post\n"
            r="r -reply to either OP or to a specific floor; How to use: (r) or (r 12)\n"
            lzl="lzl -view lzl content for a specific floor; How to use: (lzl 12)\n"
            zklz="zklz - View comment made by OP only;\n"
            f="f -refresh posts in shouye;\n"
            ft="ft -refresh comments for the current post;\n"
            b="b -go back to the list of all the posts in shouye; \n"
            mf="mf -view all your favorite tieba ; \n"
            mr="mr -view your most recent comments ; \n"
            rm="rm -view who replied to you ; \n"
            mt="mt -view all thread posted by you \n"
            signin="si - sign in (si) \n"
            like = "like - like this forum (like); dislike this forum (dislike)\n"
            e="e -exit  the browser;\n"
            c="c -clear the screen;\n"
            end="--------------------------------------"
            print help+a+s+t+pic+p+r+zklz+lzl+f+ft+b+mf+mr+rm+signin+like+e+c+end


        elif nb.startswith( 't ' )==True:
            index=re.search(r"t\s+(.*)", nb).group(1)
            try:
                index=int(index)
                if index>=50 or index <=0:
                    print "put correct index: 1-49"
                    continue
                else:
                    app.go_into_each_post(index)
                    app.last_viewed_tiezi_index=index

            except:
                if 'fid' in index:
                    index = raw_input('exclude fid part in url and type again: \n')
                app.go_into_each_post(index)
                app.last_viewed_tiezi_index=index

        elif nb.startswith( 'r ' )==True:

            floor_num=re.search(r"r\s+(\d+)", nb).group(1)

            app.Reply_to_floor(floor_num)


        elif nb.startswith( 'lzl ' )==True:
            floor_num=re.search(r"(\d+)", nb).group(1)
            app.lzl_more(floor_num)



        elif nb =="r":
            app.Reply_this_post()
        elif nb =="rm":
            app.replyme()
        elif nb =="mt":
            app.my_tie()


        elif nb =="p":
            app.Make_New_Post()

        elif nb =="mr":
            app.my_reply(1)

        elif nb =="mf":
            app.my_forum()
        elif nb =="si":
            app.onekeySignin()

        elif nb =="like":
            app.like()
        elif nb =="dislike":
            app.dislike()

        elif nb.startswith( 'mr ' )==True:
            page_num=re.search(r"(\d+)", nb).group(1)
            app.my_reply(page_num)


        elif nb=="pic":
            app.view_image()


        elif nb == "f":
            print "refreshing shouye"
            app.Refresh_shouye()

        elif nb =="b":
            app.Get_Back_To_shouye()

        elif nb =="e":
            break

        elif nb =="c":
            os.system('cls')  # on windows

        elif nb =="a":
            app.change_tieba() 

        elif nb.startswith( 'zklz' )==True:
            app.zklz=True
            print u"只看楼主"
            app.Refresh_tiezi()
            app.zklz=False

        elif nb.startswith( 'url:' )==True:
            app.Refresh_tiezi()




        elif nb =="ft":  # refresh this post only
            app.Refresh_tiezi()


        else:
            print "Please type the correct command"
    except:
        print ""

print """
 _                    _                 
| |                  | |                
| |__  _   _ _____   | |__  _   _ _____ 
|  _ \| | | | ___ |  |  _ \| | | | ___ |
| |_) ) |_| | ____|  | |_) ) |_| | ____|
|____/ \__  |_____)  |____/ \__  |_____)
      (____/               (____/       

"""
app.exit()

单单拘留楼主

输入 zklz

意义如下:

zklz

浏览首页

开辟程序,它会自动读取你的百度帐号cookie文件,登录成功会合来得islogin=1,失利会呈现islogin=0

继之你得输入help 来浏览一下常用命令。

接里可发以下采纳:

作者:bigtrace

输入帖子的url

假设你早已领会帖子的url,以格式t url输入:

例如

t url

程序会抓取此帖所有回复,并且标注每层楼底发帖ID及吧内等级,发帖客户端与发帖时间,具体某个层楼的回执数

设有平楼层是一个视频链接,则程序会将这么些下充斥链接展现出来。
如若带有图片,则显示图片url。

一旦稍帖子很丰盛,比如好几百页回复,那么程序会相继个遍历所有页面。

展开楼被楼

当你想 查看具体有平楼层的楼被楼回复时,可以使用指令 例如 lzl 25
次会自动将第25楼内的所有楼中楼内回帖显示出来。

lzl 25

针对许帖子

瞩目标一点凡是,由于百度近年来猛增改昵称效能,可是程序会自行突显用户之原始ID。所以会晤和网页版的ID有所出入。

而想变贴吧浏览,则输入a,接着输入而想看之外贴吧名称

浏览自己多年来底回帖

输入mr

mr

此间是爬
我的回帖
的页面