努力就是為了可以輕鬆過日子: python string

python 真是強大的文字處理器
顯示為.. 越級打怪，打一個晚上的 hackathon 就會寫 python

1. 開檔讀檔

開檔跟 C 差不多

fp = open(filePath, mode)

有哪些 mode 可參考 ^[1]

讀檔部分我知道有兩種

str1 = fp.read(count)
str2 = fp.readline()

第一種會讀 count 個字元 (如果 count 沒寫會發生什麼事沒試過不知道)
第二種會讀一整行 (遇到 \n 為止)

2. 輸入

常見有兩種讀法

str1 = raw_input("說明文字")
str2 = input("說明文字")

第一種是讀一整個字串用的，空白字元或符號都可以
第二種是可以直接幫忙處理讀進來的東西
如果有算式的話，str 就會直接存答案而非算式本身

也有輸入密碼的功能 (不顯示在螢幕上)^[2]

import getasss

pwd = getpass.getpass()

這裡不需要自己輸入說明文字，會自動補 Password:

3. 字串處理

取得子字串的方法

substr = str[start:end]

可以取得從 (start, end) 的子字串內容 (inclusive)

RegExp^[3]

python 的字串支援強大的 regular expression 功能
要用的話要 import re

取代

newStr = re.sub(要被取代的 pattern, 取代字串, oriStr)

其中要被取代的 pattern 可以寫 reg exp

移除字串開頭 / 結尾的特定字元

newStr = oriStr.lstrip('ch') # 移除開頭所有連續 ch
newStr = oriStr.rstrip('ch') # 移除結尾所有連續 ch

尋找所有出現的 pattern ^[4]

keywords = re.finditer(目標 pattern, str)
for match in keywords:
startPos = match.start(0)
endPos = match.end(0)
keyword = match.group()

關於 reg exp 有發現幾個有趣的點
一般來說使用 reg exp 預設使用 all greedy match，也就是說會盡量尋找 match 最長的子字串
如果想要 match 最短字串的話，可以用 ?
例如 <h1> title </h1>
用 <.*> 來找的話會找到 <h1> title </h1>
但用 <.*?> 來找的話會找到 <h1> (還有 </h1>)

另外是跳脫字元的部分
如果要在 pattern 裡面表示一些特殊符號 (ex: *, ., ...) 需要用 \ 來跳脫
但如果要表示一個 \ 所需的 pattern 為 '\\\\' (四個 \)
想要表示一個 \ 也是需要四個 \
例如

str1 = re.sub('\\\\', 'zz', 'hihi \ hihii')

str1 會變成 hihi zz hihi (把 \ 取代成 zz)
相反的

str2 = re.sub('zz', '\\\\', str1)

str2 會變成 hihi \ hihi (即 str1)

兩邊都需要四個 \ 來代表一個 \ (實驗結果)

Reference

[1] http://www.tutorialspoint.com/python/python_files_io.htm
[2] http://pymotw.com/2/getpass/
[3] https://docs.python.org/2/library/re.html
[4] http://runnable.com/UqV9JYh03AFGAACS/how-to-use-findall-finditer-split-sub-and-subn-in-regular-expressions-in-python-for-regex

努力就是為了可以輕鬆過日子

Monday, December 8, 2014

python string

1. 開檔讀檔

2. 輸入

3. 字串處理

RegExp^[3]

Reference

No comments:

Post a Comment

Monday, December 8, 2014

python string

1. 開檔讀檔

2. 輸入

3. 字串處理

RegExp[3]

Reference

No comments:

Post a Comment

RegExp^[3]