태그 사이의 텍스트를 추출하는 Java 정규식

program story

태그 사이의 텍스트를 추출하는 Java 정규식

inputbox 2020. 10. 23. 07:53

태그 사이의 텍스트를 추출하는 Java 정규식

사용자 지정 태그가있는 파일이 있고 태그 사이의 문자열을 추출하는 정규식을 작성하고 싶습니다. 예를 들어 내 태그가 다음과 같은 경우

[customtag]String I want to extract[/customtag]

태그 사이의 문자열 만 추출하는 정규식을 어떻게 작성합니까? 이 코드는 올바른 방향으로 나아가는 단계처럼 보입니다.

Pattern p = Pattern.compile("[customtag](.+?)[/customtag]");
Matcher m = p.matcher("[customtag]String I want to extract[/customtag]");

다음에 무엇을해야할지 모르겠습니다. 어떤 아이디어? 감사.

당신은 올바른 길을 가고 있습니다. 이제 다음과 같이 원하는 그룹을 추출하기 만하면됩니다.

final Pattern pattern = Pattern.compile("<tag>(.+?)</tag>", Pattern.DOTALL);
final Matcher matcher = pattern.matcher("<tag>String I want to extract</tag>");
matcher.find();
System.out.println(matcher.group(1)); // Prints String I want to extract

여러 히트를 추출하려면 다음을 시도하십시오.

public static void main(String[] args) {
    final String str = "<tag>apple</tag><b>hello</b><tag>orange</tag><tag>pear</tag>";
    System.out.println(Arrays.toString(getTagValues(str).toArray())); // Prints [apple, orange, pear]
}

private static final Pattern TAG_REGEX = Pattern.compile("<tag>(.+?)</tag>", Pattern.DOTALL);

private static List<String> getTagValues(final String str) {
    final List<String> tagValues = new ArrayList<String>();
    final Matcher matcher = TAG_REGEX.matcher(str);
    while (matcher.find()) {
        tagValues.add(matcher.group(1));
    }
    return tagValues;
}

그러나 정규 표현식이 여기에서 최선의 답이 아니라는 데 동의합니다. 관심있는 요소를 찾기 위해 XPath를 사용하겠습니다. 자세한 정보 는 Java XPath API 를 참조하십시오.

솔직히 말해서 정규 표현식은 이러한 유형의 구문 분석에 가장 적합한 아이디어가 아닙니다. 게시 한 정규식은 간단한 경우에는 잘 작동하지만 상황이 더 복잡해지면 큰 문제가 발생할 것입니다 (정규식으로 HTML을 안정적으로 구문 분석 할 수없는 동일한 이유). 나는 당신이 아마 이것을 듣고 싶지 않다는 것을 알고 있습니다. 같은 유형의 질문을 할 때 나는 그렇지 않은 것을 알고 있지만 모든 것에 정규 표현식을 사용하려는 시도를 중단 한 후에 문자열 구문 분석이 더 안정적이되었습니다.

jTopas 는 손으로 파서를 작성하는 것을 매우 쉽게 만들어주는 멋진 토크 나이저입니다 (저는 표준 자바 스캐너 / 기타. jtopas가 작동하는 것을보고 싶다면, 여기 에 제가 jTopas를 사용하여이 유형의 파일 을 파싱 하기 위해 작성한 파서 들이 있습니다.

XML 파일을 구문 분석하는 경우 xml 파서 라이브러리를 사용해야합니다. 재미로하는 것이 아니라면 스스로하지 마십시오. 검증 된 옵션이 많이 있습니다.

태그, 속성 및 값을 찾기위한 일반적이고 단순하며 약간 원시적 인 접근 방식

    Pattern pattern = Pattern.compile("<(\\w+)( +.+)*>((.*))</\\1>");
    System.out.println(pattern.matcher("<asd> TEST</asd>").find());
    System.out.println(pattern.matcher("<asd TEST</asd>").find());
    System.out.println(pattern.matcher("<asd attr='3'> TEST</asd>").find());
    System.out.println(pattern.matcher("<asd> <x>TEST<x>asd>").find());
    System.out.println("-------");
    Matcher matcher = pattern.matcher("<as x> TEST</as>");
    if (matcher.find()) {
        for (int i = 0; i <= matcher.groupCount(); i++) {
            System.out.println(i + ":" + matcher.group(i));
        }
    }

이 시도:

Pattern p = Pattern.compile(?<=\\<(any_tag)\\>)(\\s*.*\\s*)(?=\\<\\/(any_tag)\\>);
Matcher m = p.matcher(anyString);

예를 들면 :

String str = "<TR> <TD>1Q Ene</TD> <TD>3.08%</TD> </TR>";
Pattern p = Pattern.compile("(?<=\\<TD\\>)(\\s*.*\\s*)(?=\\<\\/TD\\>)");
Matcher m = p.matcher(str);
while(m.find()){
   Log.e("Regex"," Regex result: " + m.group())       
}

산출:

10 적

3.08 %

    final Pattern pattern = Pattern.compile("tag\\](.+?)\\[/tag");
    final Matcher matcher = pattern.matcher("[tag]String I want to extract[/tag]");
    matcher.find();
    System.out.println(matcher.group(1));

이 답장에 "XML을 구문 분석하는 데 정규식을 사용해서는 안됩니다. 문제를 해결하려고 시도하는 동안 제대로 작동하지 않는 엣지 케이스와 계속해서 복잡성이 증가하는 정규 표현식 만 발생합니다. . "

That being said, you need to proceed by matching the string and grabbing the group you want:

if (m.matches())
{
   String result = m.group(1);
   // do something with result
}

    String s = "<B><G>Test</G></B><C>Test1</C>";

    String pattern ="\\<(.+)\\>([^\\<\\>]+)\\<\\/\\1\\>";

       int count = 0;

        Pattern p = Pattern.compile(pattern);
        Matcher m =  p.matcher(s);
        while(m.find())
        {
            System.out.println(m.group(2));
            count++;
        }

참고URL : https://stackoverflow.com/questions/6560672/java-regex-to-extract-text-between-tags

'program story' 카테고리의 다른 글

RGBA 색상을 RGB로 변환 (0)	2020.10.24
둘 이상의 Java 에이전트로 Java 프로그램을 시작하려면 어떻게합니까? (0)	2020.10.23
Python에서 키보드 이벤트를 생성하는 방법은 무엇입니까? (0)	2020.10.23
id가 시작하는 html 요소 찾기 (0)	2020.10.23
서블릿 (Java EE)에서 필터와 리스너의 차이점 (0)	2020.10.23

현재글태그 사이의 텍스트를 추출하는 Java 정규식

inputbox

태그 사이의 텍스트를 추출하는 Java 정규식

태그 사이의 텍스트를 추출하는 Java 정규식

'program story' 카테고리의 다른 글

'program story'의 다른글

티스토리툴바

태그 사이의 텍스트를 추출하는 Java 정규식

태그 사이의 텍스트를 추출하는 Java 정규식

'program story' 카테고리의 다른 글

'program story'의 다른글

관련글

티스토리툴바