Linux/grep: Difference between revisions
< Linux
| No edit summary | 
| (No difference) | 
Revision as of 23:26, 23 November 2022
grep
Print only matching:
grep -o '[PATTERN]'
Get IP addresses:
ifconfig | grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}'
IP=`curl -s ip.oeey.com`
echo "$IP" | grep -o '^[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}$'
echo $?  # 0 on success, 1 on fail
Ignore fully commented lines:
cat [file] | grep -v "^\s*#" | grep -v "^$"
OR
grep -e pattern1 -e pattern2 filename
grep -E "pattern1|pattern2"
grep "pattern1\|pattern2"
ref: [1]
Show Non ASCII Characters
grep -a --color='auto' -P -n "[\x80-\xFF]" file.xml grep -a --color='auto' -P -n "[^\x00-\x7F]" file.xml
echo '소녀시대' | grep -P "[\x80-\xFF]"
grep -axv '.*' file.txt # doesn't seem to work on anything
Sample: https://www.w3.org/2001/06/utf-8-wrong/UTF-8-test.html
ref: https://stackoverflow.com/questions/3001177/how-do-i-grep-for-all-non-ascii-characters
Detect Corrupted Unicode Characters
awk '/[^\x00-\x7F]/{ print NR ":", $0 }' file.txt
$ awk '/[^\x00-\x7F]/{ print NR ":", $0 }' file
1: Interruptor EC não está em DESLOCAR
4: 辅助驾驶室门关é—
5: Porte cab. aux. fermée
7: Дверь аппаратной камеры Ð·Ð°ÐºÑ€Ñ‹Ñ‚а
13: é«˜åŽ‹ä¿æŠ¤æ‰‹æŸ„å‘下
14: Barrière descendue
16: Огранич. Планка ВВК опущ.
19: Barra de separação descida
22: DP未å¯åЍ
23: Puiss. rép. non activée
25: !!! ВнешнÑÑ Ð¼Ð¾Ñ‰Ð½Ð¾Ñть не включена
26: Potência Dist Não Ativada
28: Potência dist não activada
31: 机车未移动
33: Motor no se está moviendo
34: Локомотив неподвижен
35: Auto Não se Movendo
37: A não se move
40: 机车状况å…è®¸è‡ªåŠ¨åœæœº
41: Conditions auto\npermettent arrêt auto
43: УÑтановки локомотива\nПредуÑматривают Ð     °Ð²Ñ‚оматичеÑкую оÑтановку
44: Condições da moto\nPermitem Auto Parada
Ref: https://stackoverflow.com/questions/30738924/detecting-corrupt-characters-in-utf-8-encoded-text-file
--
Assuming you have your locale set to UTF-8 (see locale output), this works well to recognize invalid UTF-8 sequences:
grep -axv '.*' file.txt