Witaj Gościu! ( Zaloguj | Rejestruj )

Forum PHP.pl

> Powtarzalne, podobne fragmenty DOCTYPE regex, uproszczenie tabelarnego obsolete permitted DOCTYPE string
benio101
post 20.01.2014, 15:44:34
Post #1





Grupa: Zarejestrowani
Postów: 111
Pomógł: 10
Dołączył: 16.07.2009
Skąd: Toruń

Ostrzeżenie: (0%)
-----


Witam!

Napisałem prosty regex do walidowania elementu DOCTYPE dokumentu HTML (standard 5.1 W3C):
  1. echo "<pre>";
  2.  
  3. $regex=<<<'REGEX'
  4. @(*UTF8)^
  5. <!((?i)DOCTYPE)
  6. (?<space_characters>[\x20\x09\x0A\x0C\x0D])+
  7. ((?i)HTML)
  8. (
  9. \g<space_characters>+
  10. (
  11. ( # DOCTYPE legacy string
  12. ((?i)SYSTEM)
  13. \g<space_characters>+
  14. (?<quote_mark>["'])
  15. about:legacy-compat
  16. \k<quote_mark>
  17. )|( # obsolete permitted DOCTYPE string
  18. ((?i)PUBLIC)
  19. \g<space_characters>+
  20. (?<first_quote_mark>["'])
  21. (
  22. (
  23. -//W3C//DTD\ HTML\ 4\.0//EN
  24. \k<first_quote_mark>
  25. (
  26. \g<space_characters>+
  27. (?<third_quote_mark_1>["'])
  28. <a href="http://www\.w3\.org/TR/REC-html40/strict\.dtd" target="_blank">http://www\.w3\.org/TR/REC-html40/strict\.dtd</a>
  29. \k<third_quote_mark_1>
  30. )?
  31. )|(
  32. -//W3C//DTD\ HTML\ 4\.01//EN
  33. \k<first_quote_mark>
  34. (
  35. \g<space_characters>+
  36. (?<third_quote_mark_2>["'])
  37. <a href="http://www\.w3\.org/TR/html4/strict\.dtd" target="_blank">http://www\.w3\.org/TR/html4/strict\.dtd</a>
  38. \k<third_quote_mark_2>
  39. )?
  40. )|(
  41. -//W3C//DTD\ XHTML\ 1\.0\ Strict//EN
  42. \k<first_quote_mark>
  43. \g<space_characters>+
  44. (?<third_quote_mark_3>["'])
  45. <a href="http://www\.w3\.org/TR/xhtml1/DTD/xhtml1-strict\.dtd" target="_blank">http://www\.w3\.org/TR/xhtml1/DT...trict\.dtd</a>
  46. \k<third_quote_mark_3>
  47. )|(
  48. -//W3C//DTD\ XHTML\ 1\.1//EN
  49. \k<first_quote_mark>
  50. \g<space_characters>+
  51. (?<third_quote_mark_4>["'])
  52. <a href="http://www\.w3\.org/TR/xhtml11/DTD/xhtml11\.dtd" target="_blank">http://www\.w3\.org/TR/xhtml11/D...tml11\.dtd</a>
  53. \k<third_quote_mark_4>
  54. )
  55. )
  56. )
  57. )
  58. )?
  59. \g<space_characters>*
  60. >
  61. $@suxDX
  62. REGEX;
  63.  
  64. echo "0 - ";var_dump(preg_match($regex, '<!DOCTYPE>'));
  65. echo "0 - ";var_dump(preg_match($regex, '<!DOCTYPE >'));
  66. echo "1 - ";var_dump(preg_match($regex, '<!DOCTYPE htmL>'));
  67. echo "1 - ";var_dump(preg_match($regex, '<!DOCTYPE htmL >'));
  68. echo "0 - ";var_dump(preg_match($regex, '<!DOCTYPE htmL SYSTEM>'));
  69. echo "1 - ";var_dump(preg_match($regex, '<!DOCTYPE htmL SYSTEM \'about:legacy-compat\' >'));
  70. echo "1 - ";var_dump(preg_match($regex, '<!DOCTYPE htmL SYSTEM "about:legacy-compat">'));
  71. echo "0 - ";var_dump(preg_match($regex, '<!DOCTYPE htmL SYSTEM "about:legacy-compat\'>'));
  72. echo "0 - ";var_dump(preg_match($regex, '<!DOCTYPE htmL PUBLIC "about:legacy-compat">'));
  73. echo "1 - ";var_dump(preg_match($regex, '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">'));
  74. echo "0 - ";var_dump(preg_match($regex, '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//en">'));
  75. echo "0 - ";var_dump(preg_match($regex, '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//PL">'));
  76. echo "1 - ";var_dump(preg_match($regex, '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" \'http://www.w3.org/TR/REC-html40/strict.dtd\'>'));
  77. echo "0 - ";var_dump(preg_match($regex, '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" \'http://www4w3.org/TR/REC-html40/strict.dtd\'>'));
  78. echo "0 - ";var_dump(preg_match($regex, '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" \'http://www.w3.org/TR/REC-html40/strict.dtd\'>'));
  79. echo "1 - ";var_dump(preg_match($regex, '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">'));
  80. echo "1 - ";var_dump(preg_match($regex, '<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" >'));
  81. echo "0 - ";var_dump(preg_match($regex, '<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN">'));
  82.  
  83. echo "</pre>";
jednak zaobserwowałem, że część kodu obsolete permitted DOCTYPE string powtarza się w większości i tutaj moje pytanie, jak to wyrażenie uprościć?
Myślałem nad cachem third_quote_mark, ale nie mogę tego zrobić dla typu tabelarnego.
Macie może jakiś pomysł?
Go to the top of the page
+Quote Post

Posty w temacie


Reply to this topicStart new topic
1 Użytkowników czyta ten temat (1 Gości i 0 Anonimowych użytkowników)
0 Zarejestrowanych:

 



RSS Wersja Lo-Fi Aktualny czas: 14.08.2025 - 18:04