Linux/grep

From Omnia
Jump to navigation Jump to search

Print only matching:

grep -o '[PATTERN]'

Get IP addresses:

ifconfig | grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}'
IP=`curl -s ip.oeey.com`
echo "$IP" | grep -o '^[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}$'
echo $?  # 0 on success, 1 on fail

Ignore fully commented lines:

cat [file] | grep -v "^\s*#" | grep -v "^$"

Show Non ASCII Characters

grep -a --color='auto' -P -n "[\x80-\xFF]" file.xml
grep -a --color='auto' -P -n "[^\x00-\x7F]" file.xml
echo '소녀시대' | grep -P "[\x80-\xFF]"
grep -axv '.*' file.txt  # doesn't seem to work on anything

Sample: https://www.w3.org/2001/06/utf-8-wrong/UTF-8-test.html

ref: https://stackoverflow.com/questions/3001177/how-do-i-grep-for-all-non-ascii-characters

Detect Corrupted Unicode Characters

awk '/[^\x00-\x7F]/{ print NR ":", $0 }' file.txt
$ awk '/[^\x00-\x7F]/{ print NR ":", $0 }' file
1: Interruptor EC não está em DESLOCAR
4: 辅助驾驶室门关闭
5: Porte cab. aux. fermée
7: Дверь аппаратной камеры Ð·Ð°ÐºÑ€Ñ‹Ñ‚а
13: 高压ä¿æŠ¤æ‰‹æŸ„å‘下
14: Barrière descendue
16: Огранич. Планка ВВК опущ.
19: Barra de separação descida
22: DP未å¯åŠ¨
23: Puiss. rép. non activée
25: !!! ВнешнÑÑ Ð¼Ð¾Ñ‰Ð½Ð¾ÑÑ‚ÑŒ не включена
26: Potência Dist Não Ativada
28: Potência dist não activada
31: 机车未移动
33: Motor no se está moviendo
34: Локомотив неподвижен
35: Auto Não se Movendo
37: A não se move
40: 机车状况å…许自动åœæœº
41: Conditions auto\npermettent arrêt auto
43: УÑтановки локомотива\nПредуÑматривают Ð     °Ð²Ñ‚оматичеÑкую оÑтановку
44: Condições da moto\nPermitem Auto Parada

Ref: https://stackoverflow.com/questions/30738924/detecting-corrupt-characters-in-utf-8-encoded-text-file

--

Assuming you have your locale set to UTF-8 (see locale output), this works well to recognize invalid UTF-8 sequences:

grep -axv '.*' file.txt

Ref: https://stackoverflow.com/questions/29465612/how-to-detect-invalid-utf8-unicode-binary-in-a-text-file

keywords