Linux/grep

From Omnia
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

grep

Print only matching:

grep -o '[PATTERN]'

Get IP addresses:

ifconfig | grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}'
IP=`curl -s ip.oeey.com`
echo "$IP" | grep -o '^[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}$'
echo $?  # 0 on success, 1 on fail

Ignore fully commented lines:

cat [file] | grep -v "^\s*#" | grep -v "^$"

OR

grep -e pattern1 -e pattern2 filename
grep -E "pattern1|pattern2"
grep "pattern1\|pattern2"

ref: [1]

Show Non ASCII Characters

grep -a --color='auto' -P -n "[\x80-\xFF]" file.xml
grep -a --color='auto' -P -n "[^\x00-\x7F]" file.xml
echo '소녀시대' | grep -P "[\x80-\xFF]"
grep -axv '.*' file.txt  # doesn't seem to work on anything

Sample: https://www.w3.org/2001/06/utf-8-wrong/UTF-8-test.html

ref: https://stackoverflow.com/questions/3001177/how-do-i-grep-for-all-non-ascii-characters

Detect Corrupted Unicode Characters

awk '/[^\x00-\x7F]/{ print NR ":", $0 }' file.txt
$ awk '/[^\x00-\x7F]/{ print NR ":", $0 }' file
1: Interruptor EC não está em DESLOCAR
4: 辅助驾驶室门关闭
5: Porte cab. aux. fermée
7: Дверь аппаратной камеры Ð·Ð°ÐºÑ€Ñ‹Ñ‚а
13: 高压ä¿æŠ¤æ‰‹æŸ„å‘下
14: Barrière descendue
16: Огранич. Планка ВВК опущ.
19: Barra de separação descida
22: DP未å¯åŠ¨
23: Puiss. rép. non activée
25: !!! ВнешнÑÑ Ð¼Ð¾Ñ‰Ð½Ð¾ÑÑ‚ÑŒ не включена
26: Potência Dist Não Ativada
28: Potência dist não activada
31: 机车未移动
33: Motor no se está moviendo
34: Локомотив неподвижен
35: Auto Não se Movendo
37: A não se move
40: 机车状况å…许自动åœæœº
41: Conditions auto\npermettent arrêt auto
43: УÑтановки локомотива\nПредуÑматривают Ð     °Ð²Ñ‚оматичеÑкую оÑтановку
44: Condições da moto\nPermitem Auto Parada

Ref: https://stackoverflow.com/questions/30738924/detecting-corrupt-characters-in-utf-8-encoded-text-file

--

Assuming you have your locale set to UTF-8 (see locale output), this works well to recognize invalid UTF-8 sequences:

grep -axv '.*' file.txt

Ref: https://stackoverflow.com/questions/29465612/how-to-detect-invalid-utf8-unicode-binary-in-a-text-file

keywords