Witaj Gościu! ( Zaloguj | Rejestruj )

Forum PHP.pl

> [PHP][Regexp] Zostaw tylko (X) linków w tekscie
barat
post
Post #1





Grupa: Zarejestrowani
Postów: 183
Pomógł: 0
Dołączył: 19.05.2007

Ostrzeżenie: (0%)
-----


Witam. Mam pewną zagwostkę. Powiedzmy, że mam taki tekst:

  1. Lorem ipsum dolor sit <a href="#">amet</a>, consectetur adipiscing elit. Proin <a href="#">amet</a> elementum odio eget mauris ultricies vulputate. Suspendisse scelerisque vulputate risus ac lobortis. <a href="#">Aenean</a> euismod urna at libero vehicula non sollicitudin elit luctus. Aliquam ultricies nisi ac sapien tempus imperdiet sit amet ut nulla. Proin aliquam blandit libero eu ornare. Suspendisse a erat ligula. Phasellus ultricies odio nec metus dictum eget luctus augue interdum. Morbi at turpis libero, imperdiet iaculis sapien. In sed sapien eget turpis semper imperdiet. <a href="#">Nulla</a> iaculis blandit lorem, eget laoreet mauris euismod sed. Vivamus hendrerit euismod tellus, in adipiscing lectus eleifend in. <a href="#">Aliquam</a> imperdiet placerat orci ac ultrices. Curabitur eu sem tortor, at dapibus dolor.


Jakie wyrażenie zastosować, by zostawić tylko 1/2/3/4/X linki w tym tekście, a pozostałe zlikwidować i zostawić tylko tekst pomiędzy <a></a> ?
Oczywiście przyjmując, że:
  • Zostawiamy linki "od lewej" czyli 1/2/3/4/X pierwszych
  • Nigdy nie wiem ile linków zostało wpisanych
  • Linki nie koniecznie muszą być różne (np. 2 wystąpienia <a href="#">amet</a> + kilka innych linków. Mogą być także obok siebie)
  • Linki mogą mieć dodatkowe atrybuty np: <a href="#" target="_blank" title="title" onclick="" class="" id="">aaa</a>


Czyli np dla "pseudofunkcji" zostaw_linki(2) efekt (dla tekstu powyżej) byłby:

  1. Lorem ipsum dolor sit <a href="#">amet</a>, consectetur adipiscing elit. Proin <a href="#">amet</a> elementum odio eget mauris ultricies vulputate. Suspendisse scelerisque vulputate risus ac lobortis. Aenean euismod urna at libero vehicula non sollicitudin elit luctus. Aliquam ultricies nisi ac sapien tempus imperdiet sit amet ut nulla. Proin aliquam blandit libero eu ornare. Suspendisse a erat ligula. Phasellus ultricies odio nec metus dictum eget luctus augue interdum. Morbi at turpis libero, imperdiet iaculis sapien. In sed sapien eget turpis semper imperdiet. Nulla iaculis blandit lorem, eget laoreet mauris euismod sed. Vivamus hendrerit euismod tellus, in adipiscing lectus eleifend in. Aliquam imperdiet placerat orci ac ultrices. Curabitur eu sem tortor, at dapibus dolor.


Ten post edytował barat 8.12.2009, 10:11:14
Go to the top of the page
+Quote Post
 
Start new topic
Odpowiedzi
barat
post
Post #2





Grupa: Zarejestrowani
Postów: 183
Pomógł: 0
Dołączył: 19.05.2007

Ostrzeżenie: (0%)
-----


EDIT:
=======================

Ok ... jednak już wszystko OK (IMG:style_emoticons/default/smile.gif)

=======================
Postanowiłem się "pobawić" tym co napisałeś.
Chciałem zamienić Twoje wyrażenie regularne na takie, które wyłapuje tylko linki zewnętrzne + nie będące z konkretnych domen np domena.pl, domena1.com
Czyli:

* http://www.wp.pl - wyczuj
* http://wp.pl - wyczuj
* www.wp.pl - wyczuj
* /katalog/plik.ext - ignoruj
* http://domena.pl -ignoruj
* http://www.domena.pl -ignoruj
* http://www.domena.pl/kontroler/metoda/ -ignoruj
* www.domena.pl -ignoruj

Wyszedł mi taki potworek:
Kod
<a[^\>]+href="(http://www\.|http://|www\.)(?!((www\.)?(domena\.pl|domena1\.com)))[^"]+"([^\>]+)?>([^\<]+)</a>


Wklepanie tego w preg_match_all działa bez problemu.

  1. $str = <<<TEST
  2. Welcome to RegExr 0.3b, an intuitive tool for learning, writing, and testing Regular Expressions. Key features include:
  3.  
  4. * real time results: shows results as you type
  5. * code hinting: roll over your expression to see info on specific elements
  6. * detailed results: roll over a match to see details & view group info below
  7. <a href="http://www.wp.pl" title="aa">aaa</a> fdsg </a>
  8. * built in regex guide: double click entries to insert them into your expression
  9. * online & desktop: <a title="ar" href="http://regexr.com" name="saf">bb</a> or download the desktop version for Mac, Windows, or Linux
  10. * save your expressions: My Saved expressions are saved locally
  11. * share and rate expressions: search Community expressions and share your own <a href="http://www.szkolenia24h.pl/aaa/ssss">wew1</a>
  12.  
  13. Built by gskinner.com with Flex 3 [adobe.com/go/flex] and Spelling Plus <a href="http://szkolenia24h.pl/aaa/ssss">wew2</a> Library for text highlighting [gskinner.com/products/spl].
  14. <a href="/aaa/ssss">wew3</a>
  15. There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable. If you are going to use a passage of Lorem Ipsum, you need to be sure there isn't anything embarrassing hidden in the middle of text. All the Lorem Ipsum generators on the Internet tend to repeat predefined chunks as necessary, making this the first true generator on the Internet. It uses a dictionary of over 200 Latin words, combined with a handful of model<a href="www.wp.pl" title="asd">sentence structures</a>, to generate Lorem Ipsum which looks reasonable. The generated Lorem Ipsum is therefore always free from repetition, injected humour, or non-characteristic words etc.
  16. There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable. If you are going to use a passage of Lorem Ipsum, you need<a href="http://szkolenia24h.pl/aaa/ssss">wew2</a> to be sure there isn't anything embarrassing hidden in the middle of text. All the Lorem Ipsum generators on the Internet tend to repeat predefined chunks as necessary, making this the first true generator on the Internet. It uses a dictionary of over 200 Latin words, combined with a handful of model sentence structures, to generate Lorem Ipsum which looks reasonable. The generated Lorem <a title="ar" href="http://regexr.com" name="saf">bb</a> Ipsum is freepetition, injected humour, or <a href="http://aaa.pl">noncg-haracteristic</a> words etc.
  17. There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable.<a href="http://www.kursy24h.pl">aa</a> If you are going to use a passage of Lorem Ipsum, you need to be sure there isn't anything embarrassing hidden in the middle of text. All the Lorem Ipsum generators on the Internet tend to repeat predefined chunks as <a href="http://www.wp.pl" title="asd">noncg-haracteristic</a>necessary, making this the first true generator on the Internet. It uses a dictionary of over 200 Latin words, combined with a handful of model sentence structures, to generate Lorem Ipsum which looks reasonable. The generated Lorem Ipsum is therefore always free from repetition, injected humour, or non-characteristic words etc.
  18. TEST;
  19.  
  20. preg_match_all('/<a[^\>]+href="(http:\/\/www\.|http:\/\/|www\.)(?!((www\.)?(szkolenia24h\.pl|kursy24h\.pl)))[^"]+"([^\>]+)?>([^\<]+)<\/a>/is', $strt, $linki);
  21.  
  22. print_r($linki);


jednak użycie tej regułki w Twojej klasie (zamieniając $vars[0]:$vars[1]; na $vars[0]:$vars[6](IMG:style_emoticons/default/winksmiley.jpg) nie działa ... zgłupiałem :/
Teoretycznie powinno zachować się tak samo... różnica między wyrażeniami jest taka, że moje zwraca 7 elementów zamiast dwóch ...

  1. (
  2. [0] => Array
  3. (
  4. [0] => <a href="http://www.wp.pl" title="aa">aaa</a>
  5. [1] => <a title="ar" href="http://regexr.com" name="saf">bb</a>
  6. [2] => <a href="www.wp.pl" title="asd">sentence structures</a>
  7. [3] => <a title="ar" href="http://regexr.com" name="saf">bb</a>
  8. [4] => <a href="http://aaa.pl">noncg-haracteristic</a>
  9. [5] => <a href="http://www.wp.pl" title="asd">noncg-haracteristic</a>
  10. )
  11.  
  12. [1] => Array
  13. (
  14. [0] => [url="http://www"]http://www[/url].
  15. [1] => http://
  16. [2] => www.
  17. [3] => http://
  18. [4] => http://
  19. [5] => [url="http://www"]http://www[/url].
  20. )
  21.  
  22. [2] => Array
  23. (
  24. [0] =>
  25. [1] =>
  26. [2] =>
  27. [3] =>
  28. [4] =>
  29. [5] =>
  30. )
  31.  
  32. [3] => Array
  33. (
  34. [0] =>
  35. [1] =>
  36. [2] =>
  37. [3] =>
  38. [4] =>
  39. [5] =>
  40. )
  41.  
  42. [4] => Array
  43. (
  44. [0] =>
  45. [1] =>
  46. [2] =>
  47. [3] =>
  48. [4] =>
  49. [5] =>
  50. )
  51.  
  52. [5] => Array
  53. (
  54. [0] => title="aa"
  55. [1] => name="saf"
  56. [2] => title="asd"
  57. [3] => name="saf"
  58. [4] =>
  59. [5] => title="asd"
  60. )
  61.  
  62. [6] => Array
  63. (
  64. [0] => aaa
  65. [1] => bb
  66. [2] => sentence structures
  67. [3] => bb
  68. [4] => noncg-haracteristic
  69. [5] => noncg-haracteristic
  70. )
  71. )


vs

  1. (
  2. [0] => Array
  3. (
  4. [0] => <a href="http://www.wp.pl" title="aa">aaa</a>
  5. [1] => <a title="ar" href="http://regexr.com" name="saf">bb</a>
  6. [2] => <a href="http://www.szkolenia24h.pl/aaa/ssss">wew1</a>
  7. [3] => <a href="http://szkolenia24h.pl/aaa/ssss">wew2</a>
  8. [4] => <a href="/aaa/ssss">wew3</a>
  9. [5] => <a href="www.wp.pl" title="asd">sentence structures</a>
  10. [6] => <a href="http://szkolenia24h.pl/aaa/ssss">wew2</a>
  11. [7] => <a title="ar" href="http://regexr.com" name="saf">bb</a>
  12. [8] => <a href="http://aaa.pl">noncg-haracteristic</a>
  13. [9] => <a href="http://www.kursy24h.pl">aa</a>
  14. [10] => <a href="http://www.wp.pl" title="asd">noncg-haracteristic</a>
  15. )
  16.  
  17. [1] => Array
  18. (
  19. [0] => aaa
  20. [1] => bb
  21. [2] => wew1
  22. [3] => wew2
  23. [4] => wew3
  24. [5] => sentence structures
  25. [6] => wew2
  26. [7] => bb
  27. [8] => noncg-haracteristic
  28. [9] => aa
  29. [10] => noncg-haracteristic
  30. )
  31.  
  32. )


Ten post edytował barat 18.12.2009, 13:11:14
Go to the top of the page
+Quote Post

Posty w temacie


Reply to this topicStart new topic
2 Użytkowników czyta ten temat (2 Gości i 0 Anonimowych użytkowników)
0 Zarejestrowanych:

 



RSS Aktualny czas: 16.06.2026 - 12:11