Python Requests by Chris Hawkes

Stumble upon this video, absolutely informative, I’d jot down the key points along the way. The guy talks in a low-key fashion, and shows every details when he was trying to figure things out, very helpful for me.

First, setting up virtual environment, virtualenv env –no-site-packages
when it’s created, it goes to c:\users\username\appdata\local\programs\python\python35-32
find the executable in env\Scripts\ptyon.exe
installing setuptools, pip, wheel
in virtual environment, we need to manually install pacakges, using pip install, once it’s done, it goes to
\env\lib\site-packages, togetehr with setuptools, pip, wheel, wheel-0.21.0.dist-info, pycache etc.

To monkey around, figuring things out, use print a lot, for example an object returned from, type print(type(object)), he got requests.model.response class object. So then he peeked into the models.py file under requsets folder, find the class, then under class, there is get __state__,

def __getstate(self):
# Consume everything; accessing the content attribute makes
# sure the content has been fully read.
if not self._content_consumed:
self.content    
return {attr: getattr(self, attr, None) for attr in self.__attrs__}

browse around he found text under this class too

@property
def text(self):
"""Content of the response, in unicode.    
If Response.encoding is None, encoding will be guessed using
    ``chardet``.

    The encoding of the response content is determined based solely on HTTP
    headers, following RFC 2616 to the letter. If you can take advantage of
    non-HTTP knowledge to make a better guess at the encoding, you should
    set ``r.encoding`` appropriately before accessing this property.
    """

    # Try charset from content-type
    content = None
    encoding = self.encoding

    if not self.content:
        return str('')

    # Fallback to auto-detected encoding.
    if self.encoding is None:
        encoding = self.apparent_encoding

    # Decode unicode from given encoding.
    try:
        content = str(self.content, encoding, errors='replace')
    except (LookupError, TypeError):
        # A LookupError is raised if the encoding was not found which could
        # indicate a misspelling or similar mistake.
        #
        # A TypeError can be raised if encoding is None
        #
        # So we try blindly encoding.
        content = str(self.content, errors='replace')

    return content

so he can try out by print(test.text), then he encoutered an codnig error, so brute force he type print(test.text.encode(‘utf-8’))
after testing by print, now it’s the time to output it
outfile = open(“/projects/learningfolder/test.txt”, “w”)
outfile.write(str(test.text.encode(‘utf-8’)))
even the format is still not human-friendly, it looks like

so revise test.encoding = ‘ISO-8859-1’, pretty up the format of outfile.

print(test.cookies.get_dict())

use the below snippet to mess around website source codes leverage the Python requests module.

my_list_of_links = [
"https://www.google.com/"]
for index,link in enumerate(my_list_of_links):
payload = {'q': 'test'}
test = requests.get(link, params=payload)
test.encoding='ISO-8859-1'
print(test.text)
print(test.url)

Google has this peculiar # prefix before the searching terms. Anyway, nowadays, using API is more convenient way than relying on Request.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.