program story

문자열에서 부동 숫자를 추출하는 방법

inputbox 2020. 9. 15. 07:47
반응형

문자열에서 부동 숫자를 추출하는 방법 [중복]


이 질문에 이미 답변이 있습니다.

비슷한 문자열 Current Level: 13.4 db.이 많이 있는데 부동 소수점 숫자 만 추출하고 싶습니다. 나는 때때로 전체이기 때문에 소수가 아닌 부동이라고 말합니다. RegEx가 이것을 할 수 있습니까? 아니면 더 좋은 방법이 있습니까?


부동 소수점이 항상 10 진수 표기법으로 표현된다면

>>> import re
>>> re.findall("\d+\.\d+", "Current Level: 13.4 db.")
['13.4']

충분할 수 있습니다.

더 강력한 버전은 다음과 같습니다.

>>> re.findall(r"[-+]?\d*\.\d+|\d+", "Current Level: -13.2 db or 14.2 or 3")
['-13.2', '14.2', '3']

사용자 입력의 유효성을 검사하려면 직접 단계를 수행하여 부동 소수점을 확인할 수도 있습니다.

user_input = "Current Level: 1e100 db"
for token in user_input.split():
    try:
        # if this succeeds, you have your (first) float
        print float(token), "is a float"
    except ValueError:
        print token, "is something else"

# => Would print ...
#
# Current is something else
# Level: is something else
# 1e+100 is a float
# db is something else

숫자 뒤의 공백에 의존하지 않는 것을 포함하여 모든 기본을 포함하는 다음과 같은 것을 시도해 볼 수 있습니다.

>>> import re
>>> numeric_const_pattern = r"""
...     [-+]? # optional sign
...     (?:
...         (?: \d* \. \d+ ) # .1 .12 .123 etc 9.1 etc 98.1 etc
...         |
...         (?: \d+ \.? ) # 1. 12. 123. etc 1 12 123 etc
...     )
...     # followed by optional exponent part if desired
...     (?: [Ee] [+-]? \d+ ) ?
...     """
>>> rx = re.compile(numeric_const_pattern, re.VERBOSE)
>>> rx.findall(".1 .12 9.1 98.1 1. 12. 1 12")
['.1', '.12', '9.1', '98.1', '1.', '12.', '1', '12']
>>> rx.findall("-1 +1 2e9 +2E+09 -2e-9")
['-1', '+1', '2e9', '+2E+09', '-2e-9']
>>> rx.findall("current level: -2.03e+99db")
['-2.03e+99']
>>>

간편한 복사-붙여 넣기 :

numeric_const_pattern = '[-+]? (?: (?: \d* \. \d+ ) | (?: \d+ \.? ) )(?: [Ee] [+-]? \d+ ) ?'
rx = re.compile(numeric_const_pattern, re.VERBOSE)
rx.findall("Some example: Jr. it. was .23 between 2.3 and 42.31 seconds")

Python 문서 에는 +/- 및 지수 표기법에 대한 답변이 있습니다.

scanf() Token      Regular Expression
%e, %E, %f, %g     [-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?
%i                 [-+]?(0[xX][\dA-Fa-f]+|0[0-7]*|\d+)

This regular expression does not support international formats where a comma is used as the separator character between the whole and fractional part (3,14159). In that case, replace all \. with [.,] in the above float regex.

                        Regular Expression
International float     [-+]?(\d+([.,]\d*)?|[.,]\d+)([eE][-+]?\d+)?

re.findall(r"[-+]?\d*\.?\d+|\d+", "Current Level: -13.2 db or 14.2 or 3")

as described above, works really well! One suggestion though:

re.findall(r"[-+]?\d*\.?\d+|[-+]?\d+", "Current Level: -13.2 db or 14.2 or 3 or -3")

will also return negative int values (like -3 in the end of this string)


You can use the following regex to get integer and floating values from a string:

re.findall(r'[\d\.\d]+', 'hello -34 42 +34.478m 88 cricket -44.3')

['34', '42', '34.478', '88', '44.3']

Thanks Rex


I think that you'll find interesting stuff in the following answer of mine that I did for a previous similar question:

https://stackoverflow.com/q/5929469/551449

In this answer, I proposed a pattern that allows a regex to catch any kind of number and since I have nothing else to add to it, I think it is fairly complete


Another approach that may be more readable is simple type conversion. I've added a replacement function to cover instances where people may enter European decimals:

>>> for possibility in "Current Level: -13.2 db or 14,2 or 3".split():
...     try:
...         str(float(possibility.replace(',', '.')))
...     except ValueError:
...         pass
'-13.2'
'14.2'
'3.0'

This has disadvantages too however. If someone types in "1,000", this will be converted to 1. Also, it assumes that people will be inputting with whitespace between words. This is not the case with other languages, such as Chinese.

참고URL : https://stackoverflow.com/questions/4703390/how-to-extract-a-floating-number-from-a-string

반응형